Unicode CSV 文件中未显示孟加拉语文本
我有一个孟加拉语的 Excel 文件.要正确显示孟加拉语文本,我需要在 PC 上安装孟加拉语字体.
I have an Excel file in the Bengali language. To display the Bengali text properly I need Bengali fonts installed on the PC.
我使用 Office 2010 将 Excel 文件转换为 CSV.但它只显示?"标记而不是孟加拉语字符.然后我使用谷歌文档进行转换,同样的问题,但不可读的字符而不是'?'.我将该文件的摘录粘贴到一个 HTML 文件中,并试图在我的浏览器中查看它,但没有成功.
I converted the Excel file into CSV using Office 2010. But it only shows '?' marks instead of the Bengali characters. Then I used the Google Docs for the conversion, with the same problem, but with unreadable characters rather than '?'s. I pasted extracts from that file in an HTML file and tried to view it in my browser unsuccesfully.
我应该如何从孟加拉语的 .xlsx 文件中获取 CSV 文件,以便将其导入 MySQL 数据库?
What should I do to get a CSV file from an .xlsx file in Bengali so that I can import that into a MySQL database?
编辑:中接受的答案这个 SO 问题让我去了 Google Docs.
Edit: The answer accepted in this SO question made me go to Google Docs.
推荐答案
根据问题的答案Excel 到 UTF8 编码的 CSV,Google Docs 应该正确保存 CSV,这与 Excel 相反,它会破坏所有在使用的ANSI"编码中无法表示的字符.但也许他们改变了这一点,或者有什么问题,或者对情况的分析不正确.
According to the answers to the question Excel to CSV with UTF8 encoding, Google Docs should save CSV properly, contrary to Excel, which destroys all characters that are not representable in the "ANSI" encoding being used. But maybe they changed this, or something wrong, or the analysis of the situation is incorrect.
对于在 MS Office 程序中处理的正确编码的孟加拉语(孟加拉语),应该不需要任何孟加拉字体",因为 Arial Unicode MS 字体(随 Office 提供)包含孟加拉字符.那么数据实际上是在一些依赖于特殊编码字体的非标准编码中吗?在这种情况下,应首先将其转换为 Unicode,尽管可能可以使用始终使用该特定字体的程序以某种方式对其进行管理.
For properly encoded Bangla (Bengali) processed in MS Office programs, there should be no need for any "Bangla fonts", since the Arial Unicode MS font (shipped with Office) contains the Bangla characters. So is the data actually in some nonstandard encoding that relies on a specially encoded font? In that case, it should first be converted to Unicode, though possibly it can be somehow managed using programs that consistently use that specific font.
在 Excel 中,当使用另存为时,您可以选择Unicode 文本 (*.txt)".它将数据保存为 UTF-16 编码的 TSV(制表符分隔值).然后,您可能需要将其转换为使用逗号而不是制表符作为分隔符,和/或从 UTF-16 转换为 UTF-8.但这只有在原始数据被正确编码的情况下才有效.
In Excel, when using Save As, you can select "Unicode text (*.txt)". It saves the data as TSV (tab-separated values) in UTF-16 encoding. You may then need to convert it to use comma as separator instead of tab, and/or from UTF-16 to UTF-8. But this only works if the original data is properly encoded.
相关文章