UnicodeEncodeError: 'latin-1' 编解码器无法编码字符
当我尝试将外来字符插入数据库时,什么可能导致此错误?
>>UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256)
我该如何解决?
谢谢!
解决方案Latin-1 (ISO-8859-1) 编码中不存在字符 U+201C 左双引号.
它出现在代码页 1252(西欧)中.这是一种基于 ISO-8859-1 的特定于 Windows 的编码,但会将额外的字符放入范围 0x80-0x9F.代码页 1252 经常与 ISO-8859-1 混淆,这是一种令人讨厌但现在是标准的 Web 浏览器行为,如果您将页面作为 ISO-8859-1 提供,浏览器会将它们视为 cp1252.然而,它们确实是两种不同的编码:
<预><代码>>>>u'他说\u201CHello\u201D'.encode('iso-8859-1')Unicode编码错误>>>u'他说\u201CHello\u201D'.encode('cp1252')'他说\x93Hello\x94'如果您仅将数据库用作字节存储,则可以使用 cp1252 对 和 Windows 西方代码页中存在的其他字符进行编码.但是还有一些 cp1252 中不存在的 Unicode 字符会导致错误.
您可以使用 encode(..., 'ignore')
通过删除字符来抑制错误,但实际上在本世纪您应该在两个数据库中都使用 UTF-8和你的网页.这种编码允许使用任何字符.理想情况下,您还应该告诉 MySQL 您正在使用 UTF-8 字符串(通过设置数据库连接和字符串列的排序规则),以便它可以正确进行不区分大小写的比较和排序.
What could be causing this error when I try to insert a foreign character into the database?
>>UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256)
And how do I resolve it?
Thanks!
解决方案Character U+201C Left Double Quotation Mark is not present in the Latin-1 (ISO-8859-1) encoding.
It is present in code page 1252 (Western European). This is a Windows-specific encoding that is based on ISO-8859-1 but which puts extra characters into the range 0x80-0x9F. Code page 1252 is often confused with ISO-8859-1, and it's an annoying but now-standard web browser behaviour that if you serve your pages as ISO-8859-1, the browser will treat them as cp1252 instead. However, they really are two distinct encodings:
>>> u'He said \u201CHello\u201D'.encode('iso-8859-1')
UnicodeEncodeError
>>> u'He said \u201CHello\u201D'.encode('cp1252')
'He said \x93Hello\x94'
If you are using your database only as a byte store, you can use cp1252 to encode "
and other characters present in the Windows Western code page. But still other Unicode characters which are not present in cp1252 will cause errors.
You can use encode(..., 'ignore')
to suppress the errors by getting rid of the characters, but really in this century you should be using UTF-8 in both your database and your pages. This encoding allows any character to be used. You should also ideally tell MySQL you are using UTF-8 strings (by setting the database connection and the collation on string columns), so it can get case-insensitive comparison and sorting right.
相关文章