每个请求可以多次查询 MongoDB 吗?
来自 RDBMS 背景,我一直有这样的印象:尽量使用一个查询,假设它是有效的",这意味着您向数据库发出的每个请求都代价高昂.对于 MongoDB,这似乎是不可能的,因为您无法连接表.
Coming from an RDBMS background, I was always under the impression "Try as hard as you can to use one query, assuming it's efficient," meaning that it's costly for every request you make to the database. When it comes to MongoDB, it seems like this might not be possible because you can't join tables.
我知道它不应该是关系型的,但他们也在推动它用于博客、论坛等目的,以及我发现 RDBMS 更容易处理的东西.
I understand that it's not supposed to be relational, but they're also pushing it for purposes like blogs, forums, and things I'd find an RDBMS easier to approach with.
我在尝试了解 MongoDB 或 NoSQL 的总体效率时遇到了一些问题.如果我想获取与某些用户相关的所有帖子"(就像他们被分组一样)......使用 MySQL 我可能会做一些连接并得到它.
There are some hang ups I've had trying to understand the efficiency of MongoDB or NoSQL in general. If I wanted to get all "posts" related to certain users (as if they were grouped)... using MySQL I'd probably do some joins and get it with that.
在 MongoDB 中,假设我需要单独的集合,使用大的 $in: ['user1', 'user2', 'user3', 'user4', ...] 是否有效?
In MongoDB, assuming I need the collections separate, would it be efficient to use a large $in: ['user1', 'user2', 'user3', 'user4', ...] ?
这种方法一段时间后会变慢吗?如果我包括 1000 个用户?如果我需要获取与用户 X、Y、Z 相关的帖子列表,使用 MongoDB 来做是否高效和/或快速:
Does that method get slow after a while? If I include 1000 users? And if I needed to get that list of posts related to users X,Y,Z, would it be efficient and/or fast using MongoDB to do:
- 获取用户数组
- 在用户数组中获取帖子
一个请求的 2 个查询.在 NoSQL 中这是不好的做法吗?
2 queries for one request. Is that bad practice in NoSQL?
推荐答案
回答关于$in....的问题.
To answer the Q about $in....
我对以下场景进行了一些性能测试:
I did some performance tests with the following scenario:
~2400 万个文档集合
根据键(索引)查找其中的 100 万个文档
使用 .NET 的 CSharp 驱动程序
~24 million docs in a collection
Lookup 1 million of those documents based on a key (indexed)
Using CSharp driver from .NET
结果:
一次查询1个,单线程:109s
一次查询1个,多线程:48s
使用$in一次查询100K,单线程=20s
使用 $in 一次查询 100K, multi threaded=9s
Results:
Querying 1 at a time, single threaded : 109s
Querying 1 at a time, multi threaded : 48s
Querying 100K at a time using $in, single threaded=20s
Querying 100K at a time using $in, multi threaded=9s
使用较大的 $in(限制为最大查询大小)会显着提高性能.
So noticeably better performance using a large $in (restricted to max query size).
更新:继下面的评论之后,关于 $in 如何处理不同的块大小(查询多线程):
Update: Following on from comments below about how $in performs with different chunk sizes (queries multi-threaded):
一次查询10个(100000个批次)= 8.8s
一次查询100个(10000个批次)=4.32s
一次查询1000个(1000个批次)=4.31s
一次查询10000个(100个批次)=8.4s
一次查询100000个(10个批次)= 9s(根据上面的原始结果)
Querying 10 at a time (100000 batches) = 8.8s
Querying 100 at a time (10000 batches) = 4.32s
Querying 1000 at a time (1000 batches) = 4.31s
Querying 10000 at a time (100 batches) = 8.4s
Querying 100000 at a time (10 batches) = 9s (per original results above)
因此,在 $in 子句中批处理多少值与往返次数之间确实存在一个最佳点
So there does look to be a sweet-spot for how many values to batch up in to an $in clause vs. the number of round trips
相关文章