具有非常大的数据库文件的sqlite的性能特点是什么?

2021-12-05 00:00:00 performance database sqlite

2020 年更新,在问题发布后大约 11 年之后关闭,阻止了更新的答案.

2020 update, about 11 years after the question was posted and later closed, preventing newer answers.

这里写的几乎所有内容都已过时.曾几何时,sqlite 仅限于内存容量或 2 GB 的存储空间(32 位)或其他流行数字......好吧,那是很久以前的事了.

此处列出了官方限制.实际上,只要有可用的存储空间,sqlite 就可以正常工作.它适用于大于内存的数据集,它最初是在内存很薄的时候创建的,从一开始就很重要.

Official limitations are listed here. Practically sqlite is likely to work as long as there is storage available. It works well with dataset larger than memory, it was originally created when memory was thin and it was a very important point from the start.

存储 100 GB 数据绝对没有问题.它可能可以很好地存储 TB,但最终这就是您需要质疑 SQLite 是否是完成这项工作的最佳工具的关键点,并且您可能想要来自完整数据库的功能(远程客户端、并发写入、只读副本、分片), 等等...).

There is absolutely no issue with storing 100 GB of data. It could probably store a TB just fine but eventually that's the point where you need to question whether SQLite is the best tool for the job and you probably want features from a full fledged database (remote clients, concurrent writes, read-only replicas, sharding, etc...).

原文:

我知道,即使支持极大的数据库文件,sqlite 也不能很好地处理它们(曾经在 sqlite 网站上有一条评论指出,如果您需要大于 1GB 的文件大小,您可能需要考虑使用企业rdbms.现在找不到了,可能与旧版的sqlite有关).

I know that sqlite doesn't perform well with extremely large database files even when they are supported (there used to be a comment on the sqlite website stating that if you need file sizes above 1GB you may want to consider using an enterprise rdbms. Can't find it anymore, might be related to an older version of sqlite).

但是,出于我的目的,我想在考虑其他解决方案之前先了解情况到底有多糟糕.

However, for my purposes I'd like to get an idea of how bad it really is before I consider other solutions.

我说的是数 GB 范围内的 sqlite 数据文件,从 2GB 开始.有人对此有经验吗?任何提示/想法?

I'm talking about sqlite data files in the multi-gigabyte range, from 2GB onwards. Anyone have any experience with this? Any tips/ideas?

推荐答案

所以我用 sqlite 对非常大的文件做了一些测试,并得出了一些结论(至少对于我的特定应用).

So I did some tests with sqlite for very large files, and came to some conclusions (at least for my specific application).

测试涉及带有单个表或多个表的单个 sqlite 文件.每个表大约有 8 列,几乎都是整数和 4 个索引.

The tests involve a single sqlite file with either a single table, or multiple tables. Each table had about 8 columns, almost all integers, and 4 indices.

我们的想法是插入足够的数据,直到 sqlite 文件大约为 50GB.

The idea was to insert enough data until sqlite files were about 50GB.

单桌

我试图将多行插入到只有一张表的 sqlite 文件中.当文件大约为 7GB 时(抱歉我不能具体说明行数),插入花费的时间太长了.我估计我插入所有数据的测试需要 24 小时左右,但即使在 48 小时后它也没有完成.

I tried to insert multiple rows into a sqlite file with just one table. When the file was about 7GB (sorry I can't be specific about row counts) insertions were taking far too long. I had estimated that my test to insert all my data would take 24 hours or so, but it did not complete even after 48 hours.

这使我得出结论,单个非常大的 sqlite 表会出现插入问题,可能还有其他操作.

This leads me to conclude that a single, very large sqlite table will have issues with insertions, and probably other operations as well.

我想这并不奇怪,随着表变大,插入和更新所有索引需要更长的时间.

I guess this is no surprise, as the table gets larger, inserting and updating all the indices take longer.

多表

然后我尝试将数据按时间拆分到多个表中,每天一个表.原始 1 个表的数据被拆分为约 700 个表.

I then tried splitting the data by time over several tables, one table per day. The data for the original 1 table was split to ~700 tables.

这个设置在插入时没有问题,随着时间的推移它不会花费更长的时间,因为每天都会创建一个新表.

This setup had no problems with the insertion, it did not take longer as time progressed, since a new table was created for every day.

真空问题

正如 i_like_caffeine 所指出的,VACUUM 命令是一个问题,sqlite 文件越大.随着越来越多的插入/删除,磁盘上的文件碎片会越来越严重,所以目标是定期进行VACUUM来优化文件并恢复文件空间.

As pointed out by i_like_caffeine, the VACUUM command is a problem the larger the sqlite file is. As more inserts/deletes are done, the fragmentation of the file on disk will get worse, so the goal is to periodically VACUUM to optimize the file and recover file space.

然而,正如文档所指出的,数据库的完整副本是为了做真空,需要很长时间才能完成.因此,数据库越小,此操作完成的速度就越快.

However, as pointed out by documentation, a full copy of the database is made to do a vacuum, taking a very long time to complete. So, the smaller the database, the faster this operation will finish.

结论

对于我的特定应用程序,我可能会将数据拆分为多个 db 文件,每天一个,以获得最佳的真空性能和插入/删除速度.

For my specific application, I'll probably be splitting out data over several db files, one per day, to get the best of both vacuum performance and insertion/delete speed.

这使查询复杂化,但对我来说,能够索引这么多数据是值得的权衡.另一个优点是我可以删除整个 db 文件来删除一天的数据(我的应用程序的常见操作).

This complicates queries, but for me, it's a worthwhile tradeoff to be able to index this much data. An additional advantage is that I can just delete a whole db file to drop a day's worth of data (a common operation for my application).

我可能还需要监视每个文件的表大小,以查看速度何时会成为问题.

I'd probably have to monitor table size per file as well to see when the speed will become a problem.

太糟糕了,除了autovacuum.我无法使用它,因为我的真空目标是对文件进行碎片整理(文件空间不是什么大问题),而自动真空不会这样做.事实上,文档说明它可能会使碎片变得更糟,所以我不得不定期对文件进行全面清理.

It's too bad that there doesn't seem to be an incremental vacuum method other than auto vacuum. I can't use it because my goal for vacuum is to defragment the file (file space isn't a big deal), which auto vacuum does not do. In fact, documentation states it may make fragmentation worse, so I have to resort to periodically doing a full vacuum on the file.

相关文章