json_encode(): 参数中的 UTF-8 序列无效

2022-01-07 00:00:00 json character-encoding php

我正在使用 utf8_general_ci 归类对来自 MySQL 数据库的数据调用 json_encode().问题是有些行有我无法清理的奇怪数据.例如符号 ,所以一旦它到达 json_encode(),它就会失败并返回 json_encode(): Invalid UTF-8 sequence in argument.

I'm calling json_encode() on data that comes from a MySQL database with utf8_general_ci collation. The problem is that some rows have weird data which I can't clean. For example symbol , so once it reaches json_encode(), it fails with json_encode(): Invalid UTF-8 sequence in argument.

我已经尝试过 utf8_encode()utf8_decode(),即使使用 mb_check_encoding() 但它一直通过并造成严重破坏.

I've tried utf8_encode() and utf8_decode(), even with mb_check_encoding() but it keeps getting through and causing havoc.

在 Mac 上运行 PHP 5.3.10.所以问题是 - 如何清理无效的 utf8 符号,保留其余数据,以便 json_encoding() 可以工作?

Running PHP 5.3.10 on Mac. So the question is - how can I clean up invalid utf8 symbols, keeping the rest of data, so that json_encoding() would work?

更新.这是一种重现它的方法:

Update. Here is a way to reproduce it:

echo json_encode(pack("H*" ,'c32e'));

推荐答案

好像符号是 Å,但由于数据由不应公开的姓氏组成,所以只显示了第一个字母并且它只是由 $lastname[0] 完成的,这对于多字节字符串来说是错误的,并导致了整个麻烦.将其更改为 mb_substr($lastname, 0, 1) - 就像一个魅力.

Seems like the symbol was Å, but since data consists of surnames that shouldn't be public, only first letter was shown and it was done by just $lastname[0], which is wrong for multibyte strings and caused the whole hassle. Changed it to mb_substr($lastname, 0, 1) - works like a charm.

相关文章