Python os.stat 和 unicode 文件名

2022-01-11 00:00:00 python operating-system unicode

问题描述

在我的 Django 应用程序中，用户上传了一个名称中包含 unicode 字符的文件.

当我下载文件时，我正在调用:

os.path.exists(media)

测试文件是否存在.反过来，这似乎在调用

st = os.stat(path)

然后会因错误而爆炸:

UnicodeEncodeError: 'ascii' codec can't encode character u'xcf' in position 92: ordinal not in range(128)

对此我能做些什么?是否有 path.exists 的选项来处理它?</p>

更新:实际上，我所要做的就是将参数编码为存在，即.

os.path.exists(media.encode('utf-8')

感谢所有回答的人.

解决方案

我假设你使用的是 Unix.如果没有，请记得说明您使用的是哪个操作系统.

确保您的语言环境设置为 UTF-8.所有现代 Linux 系统都默认执行此操作，通常通过将环境变量 LANG 设置为en_US.UTF-8"或其他语言.此外，请确保您的文件名以 UTF-8 编码.

有了这个设置，即使在 Python 2.x 中，也无需弄乱编码来访问任何语言的文件.

[~/test] echo $LANGen_US.UTF-8[~/test] 回声测试 >汉字[~/test] python2.6Python 2.6.2(release26-maint，2009 年 4 月 19 日，01:56:41)[GCC 4.3.3] 在 linux2 上输入帮助"、版权"、信用"或许可"以获取更多信息.>>>导入操作系统>>>os.stat("汉字")posix.stat_result(st_mode=33188, st_ino=548583333L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8L, st_atime=1263634240, st_mtime=1263634230, st_ctime=163430)>>>os.stat(u"汉字")posix.stat_result(st_mode=33188, st_ino=548583333L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8L, st_atime=1263634240, st_mtime=1263634230, st_ctime=163430)>>>open("汉字").read()'测试
'>>>open(u"汉字").read()'测试
'

如果这不起作用，请运行locale"；如果值是C"而不是 en_US.UTF-8，则您可能没有正确安装语言环境.

如果您在 Windows 中，我认为 Unicode 文件名应该总是可以正常工作(至少对于 os/posix 模块)，因为 Windows 中的 Unicode 文件 API 是透明支持的.

In my Django application, a user has uploaded a file with a unicode character in the name.

When I'm downloading files, I'm calling :

os.path.exists(media)

to test that the file is there. This, in turn, seems to call

st = os.stat(path)

Which then blows up with the error :

UnicodeEncodeError: 'ascii' codec can't encode character u'xcf' in position 92: ordinal not in range(128)

What can I do about this? Is there an option to path.exists to handle it?

Update : Actually, all I had to do was encode the argument to exists, ie.

os.path.exists(media.encode('utf-8')

Thanks everyone who answered.

解决方案

I'm assuming you're in Unix. If not, please remember to say which OS you're in.

Make sure your locale is set to UTF-8. All modern Linux systems do this by default, usually by setting the environment variable LANG to "en_US.UTF-8", or another language. Also, make sure your filenames are encoded in UTF-8.

With that set, there's no need to mess with encodings to access files in any language, even in Python 2.x.

[~/test] echo $LANG
en_US.UTF-8
[~/test] echo testing > 漢字
[~/test] python2.6
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.stat("漢字")
posix.stat_result(st_mode=33188, st_ino=548583333L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8L, st_atime=1263634240, st_mtime=1263634230, st_ctime=1263634230)
>>> os.stat(u"漢字")
posix.stat_result(st_mode=33188, st_ino=548583333L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8L, st_atime=1263634240, st_mtime=1263634230, st_ctime=1263634230)
>>> open("漢字").read()
'testing
'
>>> open(u"漢字").read()
'testing
'

If this doesn't work, run "locale"; if the values are "C" instead of en_US.UTF-8, you may not have the locale installed correctly.

If you're in Windows, I think Unicode filenames should always just work (at least for the os/posix modules), since the Unicode file API in Windows is supported transparently.

相关文章