如何使用 UTF-8 字符串在 PHP 中使用文件系统函数?

2021-12-28 00:00:00 directory utf-8 filesystems php mkdir

我无法使用 mkdir 创建带有 UTF-8 字符的文件夹:

I can't use mkdir to create folders with UTF-8 characters:

<?php
$dir_name = "Depósito";
mkdir($dir_name);
?>

当我在 Windows 资源管理器中浏览此文件夹时,文件夹名称如下所示:

when I browse this folder in Windows Explorer, the folder name looks like this:

Depósito

我该怎么办?

我正在使用 php5

推荐答案

Just urlencode 需要作为文件名的字符串. 从 urlencode 返回的所有 字符在文件名(NTFS/HFS/UNIX)中都是有效的,然后你可以只需 urldecode 将文件名恢复为 UTF-8(或它们采用的任何编码).

Just urlencode the string desired as a filename. All characters returned from urlencode are valid in filenames (NTFS/HFS/UNIX), then you can just urldecode the filenames back to UTF-8 (or whatever encoding they were in).

注意事项(也适用于以下解决方案):

Caveats (all apply to the solutions below as well):

  • 经过 url 编码后,文件名必须少于 255 个字符(可能是字节).
  • UTF-8 对许多字符具有多种表示(使用组合字符).如果您不规范化 UTF-8,则可能无法使用 glob 进行搜索或重新打开单个文件.
  • 您不能依赖 scandir 或类似函数进行 alpha 排序.您必须 urldecode 文件名,然后使用识别 UTF-8(和排序规则)的排序算法.
  • After url-encoding, the filename must be less that 255 characters (probably bytes).
  • UTF-8 has multiple representations for many characters (using combining characters). If you don't normalize your UTF-8, you may have trouble searching with glob or reopening an individual file.
  • You can't rely on scandir or similar functions for alpha-sorting. You must urldecode the filenames then use a sorting algorithm aware of UTF-8 (and collations).

以下是不太吸引人的解决方案,但更复杂,但有更多注意事项.

The following are less attractive solutions, more complicated and with more caveats.

在 Windows 上,PHP 文件系统包装器期望并返回文件/目录名称的 ISO-8859-1 字符串.这给了你两个选择:

On Windows, the PHP filesystem wrapper expects and returns ISO-8859-1 strings for file/directory names. This gives you two choices:

  1. 在您的文件名中自由使用 UTF-8,但要了解非 ASCII 字符在 PHP 之外看起来不正确.非 ASCII UTF-8 字符将存储为多个 单个 ISO-8859-1 字符.例如.ó 在 Windows 资源管理器中将显示为 ó.

  1. Use UTF-8 freely in your filenames, but understand that non-ASCII characters will appear incorrect outside PHP. A non-ASCII UTF-8 char will be stored as multiple single ISO-8859-1 characters. E.g. ó will be appear as ó in Windows Explorer.

将您的文件/目录名称限制为字符可在 ISO-8859-1 中表示.在实践中,您将在使用之前通过 utf8_decode 传递 UTF-8 字符串在文件系统函数中,并传递条目 scandir 通过 utf8_encode 以获取 UTF-8 格式的原始文件名.

Limit your file/directory names to characters representable in ISO-8859-1. In practice, you'll pass your UTF-8 strings through utf8_decode before using them in filesystem functions, and pass the entries scandir gives you through utf8_encode to get the original filenames in UTF-8.

大量警告!

  • 如果传递给文件系统函数的任何字节匹配无效的WindowsISO-8859-1 中的文件系统字符,你运气不好.
  • Windows 可能在非英语语言环境中使用除 ISO-8859-1 以外的编码.我猜它通常是 ISO-8859-# 之一,但这意味着您需要使用 mb_convert_encoding 而不是 utf8_decode.
  • If any byte passed to a filesystem function matches an invalid Windows filesystem character in ISO-8859-1, you're out of luck.
  • Windows may use an encoding other than ISO-8859-1 in non-English locales. I'd guess it will usually be one of ISO-8859-#, but this means you'll need to use mb_convert_encoding instead of utf8_decode.

这个噩梦就是为什么你应该音译来创建文件名.

This nightmare is why you should probably just transliterate to create filenames.

相关文章