在 R 中从 MySQL 获取 UTF-8 文本返回“????";

2021-11-20 00:00:00 r utf-8 character-encoding odbc mysql

我在试图从 R 中获取 MySQL 数据库中的 UTF-8 文本时遇到困难.我在 OS X 上运行 R(通过 GUI 和命令行都尝试过),其中默认语言环境是 en_US.UTF-8,无论我怎么尝试,查询结果都显示?"用于所有非 ASCII 字符.

I'm stuck trying to fetch UTF-8 text in a MySQL database from R. I'm running R on OS X (tried both via the GUI and command line), where the default locale is en_US.UTF-8, and no matter what I try, the query result shows "?" for all non-ASCII characters.

我尝试在通过 ODBC 连接时设置 options(encoding='UTF-8')DBMSencoding='UTF-8',设置 Encoding(res$str) <- 'UTF-8' 获取结果后,以及每个结果的 'utf8' 变体,都无济于事.从命令行 mysql 客户端运行查询可以正确显示结果.

I've tried setting options(encoding='UTF-8'), DBMSencoding='UTF-8' when connecting via ODBC, setting Encoding(res$str) <- 'UTF-8' after fetching the results, as well as 'utf8' variants of each of those, all to no avail. Running the query from the command line mysql client shows the results correctly.

我完全被难住了.任何想法为什么它不起作用,或者我应该尝试的其他事情?

I'm totally stumped. Any ideas why it's not working, or other things I should try?

这是一个相当小的测试用例:

Here's a fairly minimal test case:

$ mysql -u root
mysql> CREATE DATABASE test;
mysql> USE test;
mysql> CREATE TABLE test (str VARCHAR(10)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO test (str) VALUES ('こんにちは');
Query OK, 1 row affected (0.00 sec)

mysql> select * from test;
+-----------------+
| str             |
+-----------------+
| こんにちは      |
+-----------------+
1 row in set (0.00 sec)

同时使用 RODBC 和 RMySQL 查询 R 中的表显示?????"对于 str 列:

Querying the table in R using both RODBC and RMySQL shows "?????" for the str column:

> con <- odbcDriverConnect('DRIVER=mysql;user=root', DBMSencoding='UTF-8')
> sqlQuery(con, 'SELECT * FROM rtest.test')
    str
1 ?????
> library(RMySQL)
Loading required package: DBI
> con <- dbConnect(MySQL(), user='root')
> dbGetQuery(con, 'SELECT * FROM rtest.test')
    str
1 ?????

为了完整起见,这是我的 sessionInfo:

For completeness, here's my sessionInfo:

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RMySQL_0.9-3 DBI_0.2-5    RODBC_1.3-6 

推荐答案

感谢@chooban,我发现连接会话使用的是 latin1 而不是 utf8.这是我找到的两个解决方案:

Thanks to @chooban I found out the connection session was using latin1 instead of utf8. Here are two solutions I found:

  • 对于 RMySQL,连接后运行查询 SET NAMES utf8 以更改连接字符集.
  • 对于 RODBC,使用 DSN 字符串中的 CharSet=utf8 进行连接.我无法通过 ODBC 运行 SET NAMES.
  • For RMySQL, after connecting run the query SET NAMES utf8 to change the connection character set.
  • For RODBC, connect using CharSet=utf8 in the DSN string. I was not able to run SET NAMES via ODBC.

这个问题指向我正确的方向.

相关文章