如何防止 json_encode() 删除包含无效字符的字符串

2021-12-28 00:00:00 json utf-8 php

有没有办法防止 json_encode() 为包含无效(非 UTF-8)字符的字符串返回 null?

Is there a way to keep json_encode() from returning null for a string that contains an invalid (non-UTF-8) character?

在复杂的系统中调试可能会很麻烦.实际看到无效字符或至少将其省略会更合适.就目前而言,json_encode() 将静默删除整个字符串.

It can be a pain in the ass to debug in a complex system. It would be much more fitting to actually see the invalid character, or at least have it omitted. As it stands, json_encode() will silently drop the entire string.

示例(UTF-8):

$string = 
  array(utf8_decode("Düsseldorf"), // Deliberately produce broken string
        "Washington",
        "Nairobi"); 

print_r(json_encode($string));

结果

[null,"Washington","Nairobi"]

想要的结果:

["D�sseldorf","Washington","Nairobi"]

注意:我不希望让损坏的字符串在 json_encode() 中起作用.我正在寻找更容易诊断编码错误的方法.null 字符串对此没有帮助.

Note: I am not looking to make broken strings work in json_encode(). I am looking for ways to make it easier to diagnose encoding errors. A null string isn't helpful for that.

推荐答案

php 确实会尝试抛出错误,但仅当您关闭 display_errors 时.这很奇怪,因为 display_errors 设置仅用于控制是否将错误打印到标准输出,而不是是否触发错误.我想强调的是,当您打开 display_errors 时,即使您可能会看到各种其他 php 错误,php 不仅会隐藏此错误,它甚至不会触发它.这意味着它不会出现在任何错误日志中,也不会调用任何自定义的 error_handlers.错误永远不会发生.

php does try to spew an error, but only if you turn display_errors off. This is odd because the display_errors setting is only meant to control whether or not errors are printed to standard output, not whether or not an error is triggered. I want to emphasize that when you have display_errors on, even though you may see all kinds of other php errors, php doesn't just hide this error, it will not even trigger it. That means it will not show up in any error logs, nor will any custom error_handlers get called. The error just never occurs.

这里有一些代码可以证明这一点:

Here's some code that demonstrates this:

error_reporting(-1);//report all errors
$invalid_utf8_char = chr(193);

ini_set('display_errors', 1);//display errors to standard output
var_dump(json_encode($invalid_utf8_char));
var_dump(error_get_last());//nothing

ini_set('display_errors', 0);//do not display errors to standard output
var_dump(json_encode($invalid_utf8_char));
var_dump(error_get_last());// json_encode(): Invalid UTF-8 sequence in argument

这种奇怪而不幸的行为与此错误有关 https://bugs.php.net/bug.php?id=47494 和其他一些,而且看起来永远不会被修复.

That bizarre and unfortunate behavior is related to this bug https://bugs.php.net/bug.php?id=47494 and a few others, and doesn't look like it will ever be fixed.

解决方法:

在将字符串传递给 json_encode 之前清理字符串可能是一个可行的解决方案.

Cleaning the string before passing it to json_encode may be a workable solution.

$stripped_of_invalid_utf8_chars_string = iconv('UTF-8', 'UTF-8//IGNORE', $orig_string);
if ($stripped_of_invalid_utf8_chars_string !== $orig_string) {
    // one or more chars were invalid, and so they were stripped out.
    // if you need to know where in the string the first stripped character was, 
    // then see http://stackoverflow.com/questions/7475437/find-first-character-that-is-different-between-two-strings
}
$json = json_encode($stripped_of_invalid_utf8_chars_string);

http://php.net/manual/en/function.iconv.php

说明书上说

//IGNORE 静默丢弃目标中的非法字符字符集.

//IGNORE silently discards characters that are illegal in the target charset.

所以首先删除有问题的字符,理论上 json_encode() 不应该得到任何它会窒息和失败的东西.我还没有验证带有 //IGNORE 标志的 iconv 的输出与有效 utf8 字符是什么的 json_encodes 概念完全兼容,所以买家要当心......因为可能存在边缘情况仍然失败.呃,我讨厌字符集问题.

So by first removing the problematic characters, in theory json_encode() shouldnt get anything it will choke on and fail with. I haven't verified that the output of iconv with the //IGNORE flag is perfectly compatible with json_encodes notion of what valid utf8 characters are, so buyer beware...as there may be edge cases where it still fails. ugh, I hate character set issues.

编辑
在 php 7.2+ 中,json_encode 似乎有一些新标志:JSON_INVALID_UTF8_IGNOREJSON_INVALID_UTF8_SUBSTITUTE
目前还没有太多文档,但就目前而言,此测试应该可以帮助您了解预期行为:https://github.com/php/php-src/blob/master/ext/json/tests/json_encode_invalid_utf8.phpt

Edit
in php 7.2+, there seems to be some new flags for json_encode: JSON_INVALID_UTF8_IGNORE and JSON_INVALID_UTF8_SUBSTITUTE
There's not much documentation yet, but for now, this test should help you understand expected behavior: https://github.com/php/php-src/blob/master/ext/json/tests/json_encode_invalid_utf8.phpt

而且,在 php 7.3+ 中有新标志 JSON_THROW_ON_ERROR.参见 http://php.net/manual/en/class.jsonexception.php

And, in php 7.3+ there's the new flag JSON_THROW_ON_ERROR. See http://php.net/manual/en/class.jsonexception.php

相关文章