MongoDB PHP UTF-8 问题

2021-12-28 00:00:00 mongodb utf-8 php

假设我需要插入以下文档:

Assume that I need to insert the following document:

{
    title: 'Péter'
}

(注意é)

当我使用以下 PHP 代码时,它给了我一个错误......:

It gives me an error when I use the following PHP-code ... :

$db->collection->insert(array("title" => "Péter"));

... 因为它需要是 utf-8.

... because it needs to be utf-8.

所以我应该使用这行代码:

So I should use this line of code:

$db->collection->insert(array("title" => utf8_encode("Péter")));

现在,当我请求文档时,我仍然需要对其进行解码... :

Now, when I request the document, I still have to decode it ... :

$document = $db->collection->findOne(array("_id" => new MongoId("__someID__")));
$title = utf8_decode($document['title']);

有什么方法可以使这个过程自动化?我可以更改 MongoDB 的字符编码吗(我正在迁移使用 cp1252 West Europe (latin1) 的 MySQL 数据库?

Is there some way to automate this process? Can I change the character-encoding of MongoDB (I'm migrating a MySQL-database that's using cp1252 West Europe (latin1)?

我已经考虑过更改 Content-Type-header,问题是所有静态字符串(硬编码)都不是 utf8...

I already considered changing the Content-Type-header, problem is that all static strings (hardcoded) aren't utf8...

提前致谢!提姆

推荐答案

JSON 和 BSON 只能编码/解码有效的 UTF-8 字符串,如果您的数据(包括输入)不是 UTF-8 则需要在传递之前对其进行转换它到任何 JSON 依赖系统,像这样:

JSON and BSON can only encode / decode valid UTF-8 strings, if your data (included input) is not UTF-8 you need to convert it before passing it to any JSON dependent system, like this:

$string = iconv('UTF-8', 'UTF-8//IGNORE', $string); // or
$string = iconv('UTF-8', 'UTF-8//TRANSLIT', $string); // or even
$string = iconv('UTF-8', 'UTF-8//TRANSLIT//IGNORE', $string); // not sure how this behaves

我个人更喜欢第一个选项,请参阅iconv() 手册页.其他替代方案包括:

Personally I prefer the first option, see the iconv() manual page. Other alternatives include:

  • mb_convert_encoding()
  • utf8_encode(utf8_decode($string))

您应该始终确保您的字符串是 UTF-8 编码的,即使是用户提交的字符串,但是既然您提到要从 MySQL 迁移到 MongoDB,您是否尝试过将当前数据库导出到 CSV 并使用导入Mongo 附带的脚本?他们应该处理这个...

You should always make sure your strings are UTF-8 encoded, even the user-submitted ones, however since you mentioned that you're migrating from MySQL to MongoDB, have you tried exporting your current database to CSV and using the import scripts that come with Mongo? They should handle this...

我提到 BSON 只能处理 UTF-8,但我不确定这是否完全正确,我有一个模糊的想法 BSON 使用 UTF-16 或 UTF-32编码/解码数据,但我现在无法检查.

I mentioned that BSON can only handle UTF-8, but I'm not sure if this is exactly true, I have a vague idea that BSON uses UTF-16 or UTF-32 to encode / decode data, but I can't check now.

相关文章