上传的文件应该重命名吗?
我一直在阅读有关 PHP 文件上传安全性的文章,并且有几篇文章建议重命名文件.例如,OWASP 文章无限制文件上传说:
I've been reading up on PHP file upload security and a few articles have recommended renaming the files. For example, the OWASP article Unrestricted File Upload says:
建议使用算法来确定文件名.为了例如,文件名可以是文件名加上文件名的 MD5 哈希值当天的日期.
It is recommended to use an algorithm to determine the filenames. For instance, a filename can be a MD5 hash of the name of file plus the date of the day.
如果用户上传了一个名为 Cake Recipe.doc
的文件,是否真的有理由将其重命名为 45706365b7d5b1f35
?
If a user uploads a file named Cake Recipe.doc
is there really any reason to rename it to 45706365b7d5b1f35
?
如果答案是肯定的,无论出于何种原因,那么您如何跟踪原始文件名和扩展名?
If the answer is yes, for whatever reason, then how do you keep track of the original file name and extension?
推荐答案
对于您的主要问题,重命名文件是否是一种好习惯,答案是肯定的,特别是如果您正在创建一种用户上传的文件存储库形式他们选择的文件(和文件名),原因如下:
To your primary question, is it good practice to rename files, the answer is a definite yes, especially if you are creating a form of File Repository where users upload files (and filenames) of their choosing, for several reason:
- 安全性 - 如果您的应用程序编写不当,允许按名称或通过直接访问下载文件(这很可怕,但确实发生了),那么用户(无论是恶意的还是故意的)就更难猜测"" 文件名.
- 唯一性 -- 两个不同的人上传同名文件的可能性非常高(即 avatar.gif、readme.txt、video.avi 等).使用唯一标识符可显着降低两个文件同名的可能性.
- 版本控制——使用唯一名称保存文档的多个版本"要容易得多.它还避免了需要额外的代码来解析文件名以进行更改.一个简单的例子是将 document.pdf 转换为 document(1).pdf,当您不低估用户为事物创建可怕名称的能力时,这会变得更加复杂.
- Length -- 使用已知的文件名长度总是比使用未知的文件名长度更好.我总是可以知道(我的文件路径)+(X 个字母)是某个长度,其中(我的文件路径)+(随机用户文件名)是完全未知的.
- OS -- 在尝试将极其随机/长的文件名写入驱动器时,上述长度也会产生问题.您必须考虑特殊字符、长度和修剪文件名的问题(用户可能无法收到工作文件,因为扩展名已被修剪).
- 执行 -- 操作系统很容易执行名为 .exe、.php 或(插入其他扩展名)的文件.没有扩展就很难.
- URL 编码 -- 确保名称是 URL 安全的.
Cake Recipe.doc
不是 URL 安全名称,并且可能在某些系统(服务器端或浏览器端)/某些情况下,当名称应为urlencode
时导致不一致d 值.
- Security - if you have a poorly written application that allows the download of files by name or through direct access (it's a horrid, but it happens), it's much harder for a user, whether maliciously or on purpose, to "guess" the names of files.
- Uniqueness -- the likelihood of two different people uploading a file of the same name is very high (ie. avatar.gif, readme.txt, video.avi, etc). The use of a unique identifier significantly decreases the likelihood that two files will be of the same name.
- Versioning -- It is much easier to keep multiple "versions" of a document using unique names. It also avoids the need for additional code to parse a filename to make changes. A simple example would document.pdf to document(1).pdf, which becomes more complicated when you don't underestimate users abilities to create horrible names for things.
- Length -- working with known filename lengths is always better than working with unknown filename lengths. I can always know that (my filepath) + (X letters) is a certain length, where (my filepath) + (random user filename) is completely unknown.
- OS -- the length above can also create problems when attempting to write extremely random/long filenames to a drive. You have to account for special characters, lengths and the concerns for trimmed filenames (user may not receive a working file because the extension has been trimmed).
- Execution -- It's easy for the OS to execute a file named .exe, or .php, or (insert other extension). It's hard when there isn't an extension.
- URL encoding -- Ensuring the name is URL safe.
Cake Recipe.doc
is not a URL safe name, and can on some systems (either server or browser side) / some situations, cause inconsistencies when the name should be aurlencode
d value.
至于存储信息,您通常会在数据库中执行此操作,这与您已有的需求没有什么不同,因为您需要一种方法来引用文件(谁上传,名称是什么,有时它在哪里存储,上传时间,有时是大小).除了文件的用户名之外,您只需添加文件的实际存储名称.
As for storing the information, you would typically do this in a database, no different than the need you have already, since you need a way to refer back to the file (who uploaded, what the name is, occassionally where it is stored, the time of upload, sometimes the size). You're simply adding to that the actual stored name of the file in addition to the user's name for the file.
OWASP 的建议不错——使用文件名和时间戳(不是日期)大多是唯一的.我更进一步,包括带有时间戳的微时间,以及其他一些独特的信息,这样就不会在同一时间段内重复上传小文件——我还存储了上传日期这是针对 md5 冲突的额外保险,在存储许多文件和多年的系统中,这种冲突的可能性更高.您极不可能在同一天使用文件名和微时间生成两个像 md5s 一样的文件.一个例子是:
The OWASP recommendation isn't a bad one -- using the filename and a timestamp (not date) would be mostly unique. I take it a step further to include the microtime with the timestamp, and often some other unique bit of information, so that a duplicate upload of a small file couldn't occur in the same timeframe -- I also store the date of the upload which is additional insurance against md5 clashes, which has a higher probability in systems that store many files and for years. It is incredibly unlikely that you would generate two like md5s, using filename and microtime, on the same day. An example would be:
$filename = date('Ymd') . '_' . md5($uploaded_filename . microtime());
我的 2 美分.
相关文章