FILESYSTEM vs SQLITE,同时存储多达 1000 万个文件

2021-11-17 00:00:00 database filesystems sqlite archive ntfs

我想最多存储 10M 文件,2TB 存储单元.我需要的唯一属性仅限于文件名及其内容(数据).

I would like to store up-to 10M files, 2TB storage unit. The only properties which I need restricted to filenames, and their contents (data).

文件最大长度为 100MB,大部分小于 1MB.删除文件的能力是必需的,写入和读取速度都应该是优先考虑的 - 而不需要低存储效率、恢复或完整性方法.

The files max-length is 100MB, most of them are less than 1MB. The ability of removing files is required, and both writing and reading speeds should be a priority - while low storage efficiency, recovery or integrity methods, are not needed.

我考虑过 NTFS,但它的大部分功能不是必需的,虽然不能被禁用并且被认为是一个开销问题,其中一些是:创建日期、修改日期、属性、日志和当然权限.

I thought about NTFS, but most of its features are not needed, while can't be disabled and considered to be an overhead concern, a few of them are: creation date, modification date, attribs, journal and of course permissions.

由于不需要文件系统的本机功能,您是否建议我将 SQLITE 用于此要求?或者有一个我应该注意的明显缺点?(有人会猜测删除文件将是一项复杂的任务?)

Due to the native features of a filesystem which are not needed, would you suggest I'll use SQLITE for this requirement? or there's an obvious disadvantage that I should be aware about? (one would guess that removing files will be a complicated task?)

(SQLITE 将通过 C api)

(SQLITE will be via the C api)

我的目标是使用更合适的解决方案来提高性能.提前致谢 - Doori Bar

My goal is to use a more suited solution to gain performance. Thanks in advance - Doori Bar

推荐答案

如果您的主要要求是性能,请使用本机文件系统.DBMS 不太适合处理大型 BLOB,因此 SQLite 根本不是您的选择(甚至不知道为什么每个人都认为 SQLite 是每个漏洞的塞子).

If your main requirement is performance, go with native file system. DBMS are not well suited for handling large BLOBs, so SQLite is not an option for you at all (don't even know why everybody considers SQLite to be a plug for every hole).

为了提高 NTFS(或您选择的任何其他文件系统)的性能,不要将所有文件放入单个文件夹,而是按文件名的前 N ​​个字符或扩展名对文件进行分组.

To improve performance of NTFS (or any other file system you choose) don't put all files into single folder, but group files by first N characters of their file names, or also by extension.

市场上还有一些其他文件系统,也许其中一些提供禁用某些使用功能的可能性.您可以查看维基百科上的比较并查看它们.

Also there exist some other file systems on the market and maybe some of them offer possibility to disable some of used features. You can check the comparison on Wikipedia and check them.

更正:我做了一些测试(虽然不是很广泛),但对于大多数类型的操作,将文件分组到子目录中并没有表现出性能优势,并且 NTFS 非常有效地处理了 26^4 个空文件在单个目录中从 AAAA 到 ZZZZ 命名.因此,您需要检查特定文件系统的效率.

Correction: I've made some tests (not very extensive though) that show no performance benefit in grouping files into subdirectories for most types of operations, and NTFS quite efficiently handled 26^4 empty files named from AAAA to ZZZZ in a single directory. So you need to check efficiency for your particular file system.

相关文章