蟒蛇&MySql:Unicode 和编码
我正在解析 json 数据并尝试将一些 json 数据存储到 Mysql 数据库中.我目前收到以下 unicode 错误.我的问题是我应该如何处理这个问题.
I am parsing json data and trying to store some of the json data into Mysql database. I am currently getting following unicode error. My question is how should I handle this.
- 我是否应该从数据库端处理它,如果是这样,我该如何修改我的表?
- 我应该从 python 方面处理它吗?
这是我的表结构
CREATE TABLE yahoo_questions (
question_id varchar(40) NOT NULL,
question_subj varbinary(255),
question_content varbinary(255),
question_userId varchar(40) NOT NULL,
question_timestamp varchar(40),
category_id varbinary(20) NOT NULL,
category_name varchar(40) NOT NULL,
choosen_answer varbinary(255),
choosen_userId varchar(40),
choosen_usernick varchar(40),
choosen_ans_timestamp varchar(40),
UNIQUE (question_id)
);
通过python代码插入时出错:
Error While inserting via python code:
Traceback (most recent call last):
File "YahooQueryData.py", line 78, in <module>
+"VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)", (row[2], row[5], row[6], quserId, questionTime, categoryId, categoryName, qChosenAnswer, choosenUserId, choosenNickName, choosenTimeStamp))
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/MySQLdb/cursors.py", line 159, in execute
query = query % db.literal(args)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/MySQLdb/connections.py", line 264, in literal
return self.escape(o, self.encoders)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/MySQLdb/connections.py", line 202, in unicode_literal
return db.literal(u.encode(unicode_literal.charset))
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 204-230: ordinal not in range(256)
Python 代码段:
Python Code segment:
#pushing user id to the url to get full json stack
urlobject = urllib.urlopen(base_url.format(row[2]))
qnadatajson = urlobject.read()
data = json.loads(qnadatajson)
cur.execute("INSERT INTO yahoo_questions (question_id, question_subj, question_content, question_userId, question_timestamp,"
+"category_id, category_name, choosen_answer, choosen_userId, choosen_usernick, choosen_ans_timestamp)"
+"VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)", (row[2], row[5], row[6], quserId, questionTime, categoryId, categoryName, qChosenAnswer, choosenUserId, choosenNickName, choosenTimeStamp))
json 结构
questions: [
{
Id: "20111201185322AA5HTDc",
Subject: "what are the new pokemon call?",
Content: "I used to know them I stop at dialga and palkia version and I heard there's new ones what's it call
",
Date: "2011-12-01 18:53:22",
Timestamp: "1322794402",
在运行查询之前我还做了什么我在 mysql SET character_set_client = utf8
What I also did prior to running the query I execute the following on mysql SET character_set_client = utf8
这就是 mysql 变量的样子:
And this how the mysql variables looks like:
mysql> SHOW variables LIKE '%character_set%';
+--------------------------+--------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.5.10-osx10.6-x86_64/share/charsets/ |
+--------------------------+--------------------------------------------------------+
8 rows in set (0.00 sec)
推荐答案
我认为您的 MYSQLdb python 库不知道它应该编码为 utf8,而是编码为默认的 python 系统定义字符集 latin1
.
I think that your MYSQLdb python library doesn't know it's supposed to encode to utf8, and is encoding to the default python system-defined charset latin1
.
当您 connect()
到您的数据库时,传递 charset='utf8'
参数.这也应该使手动 SET NAMES
或 SET character_set_client
变得不必要.
When you connect()
to your database, pass the charset='utf8'
parameter. This should also make a manual SET NAMES
or SET character_set_client
unnecessary.
相关文章