如何让 MySQL 返回 UTF-8?

2021-12-28 00:00:00 xml utf-8 character-encoding php

我正在使用 PHPUnit 来验证来自我的 PHP 代码的 XML 输出,但显然我在字符编码方面遇到了问题 MySQL 返回.这是我从 DOMDocument 得到的错误:

I'm using PHPUnit to validate XML output from my PHP code, but apparently I have problems with the character encoding MySQL returns. Here is the error I get from DOMDocument:

Input is not proper UTF-8, indicate encoding!
Bytes: 0xE9 0x20 0x42 0x65

我初始化了 DOMDocument 以使其使用正确的编码:

I initialize the DOMDocument so it uses the correct encoding:

$domDocument = new DOMDocument('1.0','UTF-8');

当我使用 mb_detect_encoding 检查 saveXML() 的输出时,结果是 UTF-8.

And when I check the output from saveXML() using mb_detect_encoding the result is UTF-8.

我还检查了用于创建 XML 的所有调用,对遇到的所有 createCDATASection 参数使用 mb_detect_encoding,它们都是 UTF-8 或 ASCII(没有纯文本节点,所有内容都在 CDATA 块).

I also checked all the calls used to create the XML, using mb_detect_encoding on all createCDATASection parameters encountered and they are all either UTF-8 or ASCII (there are no plain text nodes, everything is in CDATA blocks).

我认为问题来自于使用é"字符(在 ISO 8859-1).将该字符添加到我的 XML 的行是:

I think the issue comes from the use of an 'é' character (which is 0xE9 in ISO 8859-1). The line which adds that character to my XML is:

$domDocument->createCDATASection($place->name);

和 mb_detect_encoding($place->name) 给我 UTF-8.

and mb_detect_encoding($place->name) gives me UTF-8.

数据 ($place->name) 是从 MySQL 数据库中提取的.此数据库具有 UTF-8 字符集.

The data ($place->name) is pulled from a MySQL database. This database has the UTF-8 charset.

这是一些示例代码:

$query = sprintf('SELECT name FROM place where id = 1');
$result = mysql_query($query);
$result = mysql_fetch_assoc($result);


// -- Feeding UTF-8 data directly WORKS
$domDocument = new DOMDocument('1.0','UTF-8');
$rootNode = $domDocument->createElement('Response');
$rootNode->appendChild($domDocument->createCDATASection('Café Belga'));
$domDocument->appendChild($rootNode);

$matcher = array('tag' => 'Response');
self::assertTag($matcher, $domDocument->saveXML(), '', FALSE);

// -- Feeding UTF-8 data from the resultset FAILS
$domDocument = new DOMDocument('1.0','UTF-8');
$rootNode = $domDocument->createElement('Response');
$rootNode->appendChild($domDocument->createCDATASection($result['name']));
$domDocument->appendChild($rootNode);

$matcher = array('tag' => 'Response');
self::assertTag($matcher, $domDocument->saveXML(), '', FALSE);

在我的 PHPStorm 调试器中,从数据库中提取的字符串如下所示:

In my PHPStorm debugger, the string fetched from the database looks like this:

Café Belga

所以我认为这是问题的根源.在 MySQLWorkbench 中,字符串是正确的:Café Belga.

So I think that is the root of the problem. In MySQLWorkbench the string is correct: Café Belga.

使用 utf8_encode($result['name']) 时,一切正常!

When using utf8_encode($result['name']), however, everything works fine!

在手表窗口中再检查一次:

One more check in the watches window:

mb_detect_encoding($result['name']) -> "UTF-8"

mb_detect_encoding($result['name']) -> "UTF-8"

mb_detect_encoding(utf8_encode($result['name'])) -> "UTF-8"

mb_detect_encoding(utf8_encode($result['name'])) -> "UTF-8"

顺便说一句,是否有任何网站可以让我简单地复制粘贴这些十六进制值并查看它们在不同字符集中应该是什么字符?

On a side note, are there any sites where I can simply copy-paste those hex values and see what characters they are supposed to be in different character sets?

推荐答案

您必须将与数据库的连接定义为 UTF-8:

You have to define the connection to your database as UTF-8:

// Set up your connection
$connection = mysql_connect('localhost', 'user', 'pw');
mysql_select_db('yourdb', $connection);
mysql_query("SET NAMES 'utf8'", $connection);

// Now you get UTF-8 encoded stuff
$query = sprintf('SELECT name FROM place where id = 1');
$result = mysql_query($query, $connection);
$result = mysql_fetch_assoc($result);

相关文章