如何使用 UTF-8 字符串在 PHP 中使用文件系统函数?
我无法使用 mkdir
创建带有 UTF-8 字符的文件夹:
I can't use mkdir
to create folders with UTF-8 characters:
<?php
$dir_name = "Depósito";
mkdir($dir_name);
?>
当我在 Windows 资源管理器中浏览此文件夹时,文件夹名称如下所示:
when I browse this folder in Windows Explorer, the folder name looks like this:
Depósito
我该怎么办?
我正在使用 php5
推荐答案
Just urlencode
需要作为文件名的字符串. 从 urlencode
返回的所有 字符在文件名(NTFS/HFS/UNIX)中都是有效的,然后你可以只需 urldecode
将文件名恢复为 UTF-8(或它们采用的任何编码).
Just urlencode
the string desired as a filename. All characters returned from urlencode
are valid in filenames (NTFS/HFS/UNIX), then you can just urldecode
the filenames back to UTF-8 (or whatever encoding they were in).
注意事项(也适用于以下解决方案):
Caveats (all apply to the solutions below as well):
- 经过 url 编码后,文件名必须少于 255 个字符(可能是字节).
- UTF-8 对许多字符具有多种表示(使用组合字符).如果您不规范化 UTF-8,则可能无法使用
glob
进行搜索或重新打开单个文件. - 您不能依赖
scandir
或类似函数进行 alpha 排序.您必须urldecode
文件名,然后使用识别 UTF-8(和排序规则)的排序算法.
- After url-encoding, the filename must be less that 255 characters (probably bytes).
- UTF-8 has multiple representations for many characters (using combining characters). If you don't normalize your UTF-8, you may have trouble searching with
glob
or reopening an individual file. - You can't rely on
scandir
or similar functions for alpha-sorting. You musturldecode
the filenames then use a sorting algorithm aware of UTF-8 (and collations).
以下是不太吸引人的解决方案,但更复杂,但有更多注意事项.
The following are less attractive solutions, more complicated and with more caveats.
在 Windows 上,PHP 文件系统包装器期望并返回文件/目录名称的 ISO-8859-1 字符串.这给了你两个选择:
On Windows, the PHP filesystem wrapper expects and returns ISO-8859-1 strings for file/directory names. This gives you two choices:
在您的文件名中自由使用 UTF-8,但要了解非 ASCII 字符在 PHP 之外看起来不正确.非 ASCII UTF-8 字符将存储为多个 单个 ISO-8859-1 字符.例如.
ó
在 Windows 资源管理器中将显示为ó
.
Use UTF-8 freely in your filenames, but understand that non-ASCII characters will appear incorrect outside PHP. A non-ASCII UTF-8 char will be stored as multiple single ISO-8859-1 characters. E.g.
ó
will be appear asó
in Windows Explorer.
将您的文件/目录名称限制为字符可在 ISO-8859-1 中表示.在实践中,您将在使用之前通过 utf8_decode
传递 UTF-8 字符串在文件系统函数中,并传递条目 scandir
通过 utf8_encode
以获取 UTF-8 格式的原始文件名.
Limit your file/directory names to characters representable in ISO-8859-1. In practice, you'll pass your UTF-8 strings through utf8_decode
before using them in filesystem functions, and pass the entries scandir
gives you through utf8_encode
to get the original filenames in UTF-8.
大量警告!
- 如果传递给文件系统函数的任何字节匹配无效的WindowsISO-8859-1 中的文件系统字符,你运气不好.
- Windows 可能在非英语语言环境中使用除 ISO-8859-1 以外的编码.我猜它通常是 ISO-8859-# 之一,但这意味着您需要使用
mb_convert_encoding
而不是utf8_decode
.
- If any byte passed to a filesystem function matches an invalid Windows filesystem character in ISO-8859-1, you're out of luck.
- Windows may use an encoding other than ISO-8859-1 in non-English locales. I'd guess it will usually be one of ISO-8859-#, but this means you'll need to use
mb_convert_encoding
instead ofutf8_decode
.
这个噩梦就是为什么你应该音译来创建文件名.
This nightmare is why you should probably just transliterate to create filenames.
相关文章