json_encode(): 参数中的 UTF-8 序列无效
我正在使用 utf8_general_ci
归类对来自 MySQL 数据库的数据调用 json_encode()
.问题是有些行有我无法清理的奇怪数据.例如符号
,所以一旦它到达 json_encode()
,它就会失败并返回 json_encode(): Invalid UTF-8 sequence in argument
.
I'm calling json_encode()
on data that comes from a MySQL database with utf8_general_ci
collation. The problem is that some rows have weird data which I can't clean. For example symbol �
, so once it reaches json_encode()
, it fails with json_encode(): Invalid UTF-8 sequence in argument
.
我已经尝试过 utf8_encode()
和 utf8_decode()
,即使使用 mb_check_encoding()
但它一直通过并造成严重破坏.
I've tried utf8_encode()
and utf8_decode()
, even with mb_check_encoding()
but it keeps getting through and causing havoc.
在 Mac 上运行 PHP 5.3.10.所以问题是 - 如何清理无效的 utf8 符号,保留其余数据,以便 json_encoding()
可以工作?
Running PHP 5.3.10 on Mac. So the question is - how can I clean up invalid utf8 symbols, keeping the rest of data, so that json_encoding()
would work?
更新.这是一种重现它的方法:
Update. Here is a way to reproduce it:
echo json_encode(pack("H*" ,'c32e'));
推荐答案
好像符号是 Å
,但由于数据由不应公开的姓氏组成,所以只显示了第一个字母并且它只是由 $lastname[0]
完成的,这对于多字节字符串来说是错误的,并导致了整个麻烦.将其更改为 mb_substr($lastname, 0, 1)
- 就像一个魅力.
Seems like the symbol was Å
, but since data consists of surnames that shouldn't be public, only first letter was shown and it was done by just $lastname[0]
, which is wrong for multibyte strings and caused the whole hassle. Changed it to mb_substr($lastname, 0, 1)
- works like a charm.
相关文章