PHP 输出显示带有问号的黑色小菱形

2021-12-27 00:00:00 encoding character-encoding php

我正在编写一个从数据库源中提取的 php 程序.一些 varchars 的引号显示为带有问号的黑色菱形 ( , 替换字符,我假设来自 Microsoft Word 文本).

I'm writing a php program that pulls from a database source. Some of the varchars have quotes that are displaying as black diamonds with a question mark in them (�, REPLACEMENT CHARACTER, I assume from Microsoft Word text).

如何使用php去掉这些字符?

How can I use php to strip these characters out?

推荐答案

如果你看到那个字符 ( U+FFFD "REPLACEMENT CHARACTER") 这通常意味着文本本身以某种形式的单字节编码进行编码但被解释使用其中一种 unicode 编码(UTF8 或 UTF16).

If you see that character (� U+FFFD "REPLACEMENT CHARACTER") it usually means that the text itself is encoded in some form of single byte encoding but interpreted in one of the unicode encodings (UTF8 or UTF16).

如果反过来,它(通常)看起来像这样:ä.

If it were the other way around it would (usually) look something like this: ä.

可能原始编码是 ISO-8859-1,也称为 Latin-1.您无需更改脚本即可进行检查:浏览器为您提供了以不同编码重新解释页面的选项——在 Firefox 中使用查看"->字符编码".

Probably the original encoding is ISO-8859-1, also known as Latin-1. You can check this without having to change your script: Browsers give you the option to re-interpret a page in a different encoding -- in Firefox use "View" -> "Character Encoding".

要使浏览器使用正确的编码,请添加这样的 HTTP 标头:

To make the browser use the correct encoding, add an HTTP header like this:

header("Content-Type: text/html; charset=ISO-8859-1");

或将编码放入元标记中:

or put the encoding in a meta tag:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

或者,您可以尝试以另一种编码(最好是 UTF-8)从数据库中读取或使用 iconv().

Alternatively you could try to read from the database in another encoding (UTF-8, preferably) or convert the text with iconv().

相关文章