检查 SQLite 中文本的编码

2021-12-28 00:00:00 csv utf-8 sqlite

我在处理 SQlite 中的非欧洲文本时做噩梦.我认为问题在于 SQlite 没有用 UTF8 编码文本.所以我想检查一下编码是什么,并希望将其更改为utf8.我用 UTF8 编码了一个 CSV 并简单地将它导入到 SQlite 但非罗马文本是乱码.

I'm having a nightmare dealing with non Eurpean texts in SQlite. I think the problem is that SQlite isn't encoding the text in UTF8. So I want to check what the encoding is, and hopefully change it to utf8. I encoded a CSV in UTF8 and simply imported it to SQlite but the non-roman text is garbled.

我想知道:1)如何检查编码.2)如果不是utf8,如何更改编码.我一直在阅读有关 Pragma 编码的文章,但我不确定如何使用它.

I would like to know: 1)how to check the encoding. 2)How to change the encoding if it is not utf8. I've been reading about Pragma encoding, but I'm not sure how to use this.

我使用 OpenOffice 3 创建了一个包含一半英文和一半日文文本的电子表格.接下来,我使用 utf8 将文件保存为 CSV.这部分似乎还可以.我也尝试使用 Google Docs 来做这件事,而且效果很好.接下来我打开了 SQLite Browser 并进行了 CSV 导入.英文文本显示完美,但日文是乱码.我认为 sqlite 使用了不同的编码(也许是 utf16?).

I used OpenOffice 3 to create a spreadsheet with half ENglish and half Japanese text. Next I saved the file as a CSV using utf8. This part seems to be ok. I also tried to do it using Google Docs and it worked fine. Next I opened SQlite Browser and did CSV import. The ENglish text shows up perfectly, but the Japanese text is garbled symbols. I think sqlite is using a dfferent encoding (perhaps utf16?).

推荐答案

你可以用这个 pragma 测试编码:

You can test the encoding with this pragma:

PRAGMA encoding; 

您不能更改现有数据库的编码.要使用特定编码创建新数据库,请打开与空白文件的 SQLite 连接,运行以下编译指示:

You cannot change the encoding for an existing database. To create a new database with a specific encoding, open a SQLite connection to a blank file, run this pragma:

PRAGMA encoding = "UTF-8"; 

然后然后创建您的数据库.

如果您有一个数据库并且需要不同的编码,那么您需要使用新编码创建一个新数据库,然后重新创建架构并导入所有数据.

If you have a database and need a different encoding, then you need to create a new database with the new encoding, and then recreate the schema and import all the data.

但是,如果您遇到乱码文本的问题,那几乎总是与所使用的工具之一有关,而不是 SQLite 本身的问题.即使 SQLite 使用不同的编码,唯一的最终结果是它会导致一些额外的计算,因为 SQLite 不断从存储的编码转换为 API 请求的编码.如果您使用的不是 C 级 API 的任何东西,那么您永远不应该关心编码——您使用的工具所使用的 API 将决定应该使用什么编码.

However, if you have a problem with garbled text it's pretty much always a problem with one of the tools being used, not SQLite itself. Even if SQLite is using a different encoding depending, the only end result is that it will cause some extra computation as SQLite converts from stored encoding to API-requested encoding constantly. If you're using anything other than the C-level API's, then you should never care about encoding--the API's used by the tool you're using will dictate what encoding should be used.

许多 SQLite 工具都显示了将文本重整到我们的 SQLite 之外的问题,包括命令行 shell.尝试从命令行运行 SQLite 并告诉它导入文件本身,而不是通过 SQLite 浏览器.

Many SQLite tools have shown issues mangling text into our out of SQLite, including command line shells. Try running SQLite from a command line and tell it to import the file itself instead of going through SQLite Browser.

相关文章