使用 source 命令会破坏非 Unicode 文本编码

2021-09-25 00:00:00 windows mysql

我必须将一些包含未编码为 Unicode 的国家字符的数据导入到独立的 MySQL 版本:5.0.18 在 Windows7 64 位上运行.经过一些最初的问题,我终于让它在 MySQL 控制台中工作.

I have to import some data containing national characters not coded as Unicode into stand alone MySQL Version: 5.0.18 running on Windows7 64bit. After some initial problems I finally got it working in MySQL console.

但由于数据超过 50 MB,因此无法在控制台中输入和/或使用剪贴板.所以我创建脚本文件才发现导入后国字是乱码.

But as the data is more than 50 MByte typing into console and or using clipboard is not possible. So I created script file only to find out that the national characters are gibberish after import.

问题是,如果我对任何文件使用 source 命令,编码就会中断.如果我打开同一个文件并通过剪贴板将内容复制到控制台,则一切正常.这里最小的 MCVE 用于测试:

The problem is that if I use source command with any file the encoding broke. If I open the same file and copy the stuff to console by clipboard all works as should. Here smallest MCVE to test this:

DROP DATABASE IF EXISTS dbs;
CREATE DATABASE dbs;
USE dbs;

SET NAMES latin2;

DROP TABLE IF EXISTS `tab`;
CREATE TABLE `tab` (`ix` INT default 0,`nam` VARCHAR(50) default '' );
INSERT INTO `tab` VALUES
 (1,'aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ'),
 (2,'áäčďéíĺľňóôŕřšťú ýžÁ ČĎÉÍĹĽŇÓ ŔŘŠŤÚ ÝŽ');
SELECT * FROM `tab`;

当我通过剪贴板将其复制到MySQL控制台时,输出如下:

When I copy this to MySQL console by clipboard the output is like this:

+------+----------------------------------------+
| ix   | nam                                    |
+------+----------------------------------------+
|    1 | aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ |
|    2 | áäčďéíĺľňóôŕřšťú ýžÁ ČĎÉÍĹĽŇÓ ŔŘŠŤÚ ÝŽ |
+------+----------------------------------------+
2 rows in set (0.00 sec)

这是需要的.但是当我将所有这些放入 test.sql 文件并运行

Which is desired. But when I put all this into test.sql file and run

source test.sql;

我得到了这个输出:

+------+----------------------------------------+
| ix   | nam                                    |
+------+----------------------------------------+
|    1 | aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ |
|    2 | ßńŔ´ÚÝňż˛ˇ˘Ó°ÜŁ˙ ř×┴ ╚¤╔═┼╝ĎË └ěŐŹ┌ ŢÄ |
+------+----------------------------------------+

这显然是错误的(看起来像一些默认的 MS-DOS 字符集).我认为问题不在于表或数据库,因为这对于纯文本输出来说是一样的:

Which is obviously wrong (looks like some default MS-DOS charset). I think the problem is not on side of table nor database as this is the same for pure text outs like:

SET NAMES latin2;
SELECT 'áäčďéíĺľňóôŕřšťú ýžÁ ČĎÉÍĹĽŇÓ ŔŘŠŤÚ ÝŽ' AS 'aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ';

带有剪贴板的输出:

+----------------------------------------+
| aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ |
+----------------------------------------+
| áäčďéíĺľňóôŕřšťú ýžÁ ČĎÉÍĹĽŇÓ ŔŘŠŤÚ ÝŽ |
+----------------------------------------+

source 文件:

+----------------------------------------+
| aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ |
+----------------------------------------+
| ßńŔ´ÚÝňż˛ˇ˘Ó°ÜŁ˙ ř×┴ ╚¤╔═┼╝ĎË └ěŐŹ┌ ŢÄ |
+----------------------------------------+

就像从文件导入时编码被搞砸了一样.或者在通过键盘或剪贴板输入到MySQL控制台时,编码发生了变化.

It is like when importing from files the encoding got screwed. Or while inputing to MySQL console through Keyboard or Clipboard the coding got changed.

那么发生了什么以及如何纠正这个问题(不丢失数据)?

So what is going on and how to correct this (without the loss of data)?

  • 使用 < 命令行选项代替 source 没有帮助
  • source 使用 -e 命令行选项没有帮助
  • 使用默认字符集命令行选项没有帮助
  • 对非 Unicode 字符串使用 UTF8 会导致数据过长错误和数据丢失
  • 剪贴板中的数据与文件中的数据相同
  • using < command line option instead of source does not help
  • using -e command line option for source does not help
  • using default charset command line option does not help
  • using UTF8 for non Unicode string lead to Data too long errors and loss of data
  • data in clipboard is the same as in the file

好吧,我尝试了 MySQL 5.7.19 的较新版本,这花了我很长时间才开始,因为他们更改了初始化和内容(wtf?那个疯狂得到 1.8 GByte 没有任何数据!).无论我做什么,它的行为都是一样的.所以我尝试使用 UTF8 编码:

Well I tried newer version of MySQL 5.7.19 which took mi quite a while to get started as they change the initialization and things (wtf? that insanity got 1.8 GByte without any data !). It behaves the same no matter what I do. So I tried to use UTF8 encoding:

DROP DATABASE IF EXISTS dbs;
CREATE DATABASE dbs CHARACTER SET utf8 COLLATE 'utf8_unicode_ci';
USE dbs;

SET NAMES utf8;

DROP TABLE IF EXISTS `tab`;
CREATE TABLE `tab` (`ix` INT default 0,`nam` VARCHAR(50) default '' ) CHARACTER SET utf8 COLLATE 'utf8_unicode_ci';
INSERT INTO `tab` VALUES
 (1,'áäčďéíĺľňóôŕřšťú ýžÁ ČĎÉÍĹĽŇÓ ŔŘŠŤÚ ÝŽ'),
 (2,'aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ');
SELECT * FROM `tab`;

#SELECT 'áäčďéíĺľňóôŕřšťú ýžÁ ČĎÉÍĹĽŇÓ ŔŘŠŤÚ ÝŽ' AS 'aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ';

#SHOW COLLATION;
#SHOW CHARACTER SET;
SHOW VARIABLES LIKE 'char%';

是的,脚本文件已转换为 UTF8.现在这里是 my.ini 设置:

And Yes the script file is converted to UTF8. Now here is my.ini setting:

[mysql]

 default-character-set=utf8

[mysqld]

 skip-character-set-client-handshake
 character-set-server=utf8
 collation-server=utf8_unicode_ci

使用 source 此处最终为文件工作,结果如下:

This finally worked for the file using source here the result:

+------+--------------------------------------------------------------------------+
| ix   | nam                                                                      |
+------+--------------------------------------------------------------------------+
|    1 | áäčďéíĺľňóôŕřšťú ýžÁ ČĎÉÍĹĽŇÓ ŔŘŠŤÚ ÝŽ                                   |
|    2 | aacdeillnoorrstuuyzAACDEILLNOORRSTUUYZ                                   |
+------+--------------------------------------------------------------------------+
+--------------------------+--------------------------------------------------------------------+
| Variable_name            | Value                                                              |
+--------------------------+--------------------------------------------------------------------+
| character_set_client     | utf8                                                               |
| character_set_connection | utf8                                                               |
| character_set_database   | utf8                                                               |
| character_set_filesystem | binary                                                             |
| character_set_results    | utf8                                                               |
| character_set_server     | utf8                                                               |
| character_set_system     | utf8                                                               |
+--------------------------+--------------------------------------------------------------------+

推荐答案

您必须在创建表时指定 CHARACTER SET,最好是在列本身上.否则,您会从 SHOW VARIABLES LIKE 'char%';

You must specify a CHARACTER SET when creating the table, preferably on the column itself. Otherwise, you get some default from SHOW VARIABLES LIKE 'char%';

SET NAMES 在客户端建立编码.

INSERTingSELECTing 时,编码从客户端的编码 (SET NAMES) 更改为列的 (...VARCHAR ...字符集...).

When INSERTing or SELECTing, the encoding is changed from the client's encoding (SET NAMES) to the column's (... VARCHAR ... CHARACTER SET ...).

你真的需要latin2吗?世界正在转向 UTF-8.

Do you really need latin2? The world is moving to UTF-8.

相关文章