在 PHP 中获取非 UTF-8 格式的字段作为 UTF-8?

2022-01-06 00:00:00 utf-8 php html webforms

我有一个非 UTF-8 格式的表单(实际上是在 Windows-1251 中).人们当然会在那里张贴他们喜欢的任何角色.浏览器有助于将 unpresentable-in-Windows-1251 字符转换为 html 实体,因此我仍然可以识别它们.例如,如果用户输入 →,我会收到 →.这部分很棒,比如,如果我只是回显它,浏览器无论如何都会正确显示 →.

I have a form served in non-UTF-8 (it’s actually in Windows-1251). People, of course, post there any characters they like to. The browser helpfully converts the unpresentable-in-Windows-1251 characters to html entities so I can still recognise them. For example, if user types an →, I receive an →. That’s partially great, like, if I just echo it back, the browser will correctly display the → no matter what.

问题是,我实际上在显示之前对文本做了一个 htmlspecialchars()(这是一个 PHP 函数,用于将特殊字符转换为 HTML 实体,例如 & 变成 &).我的用户有时会输入诸如 —© 之类的内容,我想将它们显示为实际的 —©,不是——和©.

The problem is, I actually do a htmlspecialchars () on the text before displaying it (it’s a PHP function to convert special characters to HTML entities, e.g. & becomes &). My users sometimes type things like — or ©, and I want to display them as actual — or ©, not — and ©.

我无法区分 → 和 →,因为我把它们都当作 →.而且,由于我 htmlspecialchars () 文本,和我也从浏览器中得到了一个 → 的 →,我回显了一个 → 在浏览器中显示为 → .所以用户的输入被破坏了.

There’s no way for me to distinguish an → from →, because I get them both as →. And, since I htmlspecialchars () the text, and I also get a → for a → from browser, I echo back an → which gets displayed as → in a browser. So the user’s input gets corrupted.

有没有办法说:好吧,我在 Windows-1251 中提供此表单,但是你请以 UTF-8 格式向我发送输入,让我自己处理"?

Is there a way to say: "Okay, I serve this form in Windows-1251, but will you please just send me the input in UTF-8 and let me deal with it myself"?

哦,我知道将整个软件切换到 UTF-8 是个好主意,但这工作量太大,我很乐意快速解决这个问题.如果这很重要,表单的 enctype 是multipart/form-data"(包括文件上传器,因此不能使用任何其他 enctype).我使用 Apache 和 PHP.

Oh, I know that the good idea is to switch the whole software to UTF-8, but that is just too much work, and I would be happy to get a quick fix for this. If this matters, the form’s enctype is "multipart/form-data" (includes file uploader, so cannot use any other enctype). I use Apache and PHP.

谢谢!

推荐答案

浏览器有助于将 unpresentable-in-Windows-1251 字符转换为 html 实体

The browser helpfully converts the unpresentable-in-Windows-1251 characters to html entities

嗯,差不多,只是它根本没有帮助.现在你无法区分真正的ƛ"有人输入希望它作为一个文本字符串出现,其中包含一个&"和一个Б"字符.

Well, nearly, except that it's not at all helpful. Now you can't tell the difference between a real "ƛ" that someone typed expecting it to come out as a string of text with a ‘&’ in it, and a ‘Б’ character.

我实际上在显示之前对文本做了一个 htmlspecialchars()

I actually do a htmlspecialchars () on the text before displaying it

是的.您必须这样做,否则就会出现安全问题.

Yes. You must do that, or else you've got a security problem.

好的,我在 Windows-1251 中提供此表单,但请您将输入以 UTF-8 格式发送给我,让我自己处理

Okay, I serve this form in Windows-1251, but will you please just send me the input in UTF-8 and let me deal with it myself

是的,据说您在表单标签中发送了accept-charset="UTF-8"".但现实是这在 IE 中不起作用.要获取 UTF-8 格式的表单,您必须发送 UTF-8 格式的表单(页面).

Yeah, supposedly you send "accept-charset="UTF-8"" in the form tag. But the reality is that doesn't work in IE. To get a form in UTF-8, you must send a form (page) in UTF-8.

我知道最好的办法是将整个软件切换到 UTF-8,

I know that the good idea is to switch the whole software to UTF-8,

是的.好吧,至少包含表单的页面的编码应该是UTF-8.

Yup. Well, at least the encoding of the page containing the form should be UTF-8.

相关文章