在 PHP 中获取非 UTF-8 格式的字段作为 UTF-8?
我有一个非 UTF-8 格式的表单(实际上是在 Windows-1251 中).人们当然会在那里张贴他们喜欢的任何角色.浏览器有助于将 unpresentable-in-Windows-1251 字符转换为 html 实体,因此我仍然可以识别它们.例如,如果用户输入 →,我会收到 →
.这部分很棒,比如,如果我只是回显它,浏览器无论如何都会正确显示 →.
I have a form served in non-UTF-8 (it’s actually in Windows-1251). People, of course, post there any characters they like to. The browser helpfully converts the unpresentable-in-Windows-1251 characters to html entities so I can still recognise them. For example, if user types an →, I receive an →
. That’s partially great, like, if I just echo it back, the browser will correctly display the → no matter what.
问题是,我实际上在显示之前对文本做了一个 htmlspecialchars()(这是一个 PHP 函数,用于将特殊字符转换为 HTML 实体,例如 & 变成 &
).我的用户有时会输入诸如 —
或 ©
之类的内容,我想将它们显示为实际的 —
或©
,不是——和©.
The problem is, I actually do a htmlspecialchars () on the text before displaying it (it’s a PHP function to convert special characters to HTML entities, e.g. & becomes &
). My users sometimes type things like —
or ©
, and I want to display them as actual —
or ©
, not — and ©.
我无法区分 → 和 →
,因为我把它们都当作 →
.而且,由于我 htmlspecialchars () 文本,和我也从浏览器中得到了一个 → 的 →
,我回显了一个 →
在浏览器中显示为 →
.所以用户的输入被破坏了.
There’s no way for me to distinguish an → from →
, because I get them both as →
. And, since I htmlspecialchars () the text, and I also get a →
for a → from browser, I echo back an →
which gets displayed as →
in a browser. So the user’s input gets corrupted.
有没有办法说:好吧,我在 Windows-1251 中提供此表单,但是你请以 UTF-8 格式向我发送输入,让我自己处理"?
Is there a way to say: "Okay, I serve this form in Windows-1251, but will you please just send me the input in UTF-8 and let me deal with it myself"?
哦,我知道将整个软件切换到 UTF-8 是个好主意,但这工作量太大,我很乐意快速解决这个问题.如果这很重要,表单的 enctype 是multipart/form-data"(包括文件上传器,因此不能使用任何其他 enctype).我使用 Apache 和 PHP.
Oh, I know that the good idea is to switch the whole software to UTF-8, but that is just too much work, and I would be happy to get a quick fix for this. If this matters, the form’s enctype is "multipart/form-data" (includes file uploader, so cannot use any other enctype). I use Apache and PHP.
谢谢!
推荐答案
浏览器有助于将 unpresentable-in-Windows-1251 字符转换为 html 实体
The browser helpfully converts the unpresentable-in-Windows-1251 characters to html entities
嗯,差不多,只是它根本没有帮助.现在你无法区分真正的ƛ"有人输入希望它作为一个文本字符串出现,其中包含一个&"和一个Б"字符.
Well, nearly, except that it's not at all helpful. Now you can't tell the difference between a real "ƛ" that someone typed expecting it to come out as a string of text with a ‘&’ in it, and a ‘Б’ character.
我实际上在显示之前对文本做了一个 htmlspecialchars()
I actually do a htmlspecialchars () on the text before displaying it
是的.您必须这样做,否则就会出现安全问题.
Yes. You must do that, or else you've got a security problem.
好的,我在 Windows-1251 中提供此表单,但请您将输入以 UTF-8 格式发送给我,让我自己处理
Okay, I serve this form in Windows-1251, but will you please just send me the input in UTF-8 and let me deal with it myself
是的,据说您在表单标签中发送了accept-charset="UTF-8"".但现实是这在 IE 中不起作用.要获取 UTF-8 格式的表单,您必须发送 UTF-8 格式的表单(页面).
Yeah, supposedly you send "accept-charset="UTF-8"" in the form tag. But the reality is that doesn't work in IE. To get a form in UTF-8, you must send a form (page) in UTF-8.
我知道最好的办法是将整个软件切换到 UTF-8,
I know that the good idea is to switch the whole software to UTF-8,
是的.好吧,至少包含表单的页面的编码应该是UTF-8.
Yup. Well, at least the encoding of the page containing the form should be UTF-8.
相关文章