MySQL“输入"操作员在(大?)值数量上的表现

2021-11-20 00:00:00 performance sql operators mysql

我最近一直在试验 Redis 和 MongoDB，似乎经常出现这样的情况，您会在 MongoDB 或 Redis 中存储 id 的数组.对于这个问题，我会坚持使用 Redis，因为我问的是 MySQL IN 运算符.

I have been experimenting with Redis and MongoDB lately and it would seem that there are often cases where you would store an array of id's in either MongoDB or Redis. I'll stick with Redis for this question since I am asking about the MySQL IN operator.

我想知道在 IN 运算符中列出大量 (300-3000) id 的性能如何，看起来像这样:

I was wondering how performant it is to list a large number (300-3000) of id's inside the IN operator, which would look something like this:

SELECT id, name, price FROM products WHERE id IN (1, 2, 3, 4, ...... 3000)

想象一下像产品和类别表这样简单的事情，您通常可以将它们连接在一起以从某个获得产品类别.在上面的示例中，您可以看到在 Redis ( category:4:product_ids ) 中的给定类别下，我返回了 id 为 4 的类别中的所有产品 ID，并将它们放在上面的 中IN 运算符中的 SELECT 查询.

Imagine something as simple as a products and categories table which you might normally JOIN together to get the products from a certain category. In the example above you can see that under a given category in Redis ( category:4:product_ids ) I return all the product ids from the category with id 4, and place them in the above SELECT query inside the IN operator.

性能如何?

这是视情况而定"的情况吗?或者是否有具体的这是(不)可接受的"或快"或慢"，或者我应该添加 LIMIT 25，还是没有帮助?

Is this an "it depends" situation? Or is there a concrete "this is (un)acceptable" or "fast" or "slow" or should I add a LIMIT 25, or doesn't that help?

SELECT id, name, price FROM products WHERE id IN (1, 2, 3, 4, ...... 3000) LIMIT 25

或者我应该修剪 Redis 返回的产品 id 数组以将其限制为 25，并且只将 25 个 id 添加到查询而不是 3000 并且 LIMIT-从查询内部将其添加到 25?

Or should I trim the array of product id's returned by Redis to limit it to 25 and only add 25 id's to the query rather than 3000 and LIMIT-ing it to 25 from inside the query?

SELECT id, name, price FROM products WHERE id IN (1, 2, 3, 4, ...... 25)

非常感谢任何建议/反馈！

Any suggestions/feedback is much appreciated!

推荐答案

一般来说，如果 IN 列表变得太大(对于一些定义不明确的太大"值，通常在100 或更小的区域)，使用连接变得更有效率，如果需要，创建一个临时表来保存数字.

Generally speaking, if the IN list gets too large (for some ill-defined value of 'too large' that is usually in the region of 100 or smaller), it becomes more efficient to use a join, creating a temporary table if need so be to hold the numbers.

如果数字是密集集(没有间隙 - 样本数据表明)，那么您可以使用 WHERE id BETWEEN 300 AND 3000 做得更好.

If the numbers are a dense set (no gaps - which the sample data suggests), then you can do even better with WHERE id BETWEEN 300 AND 3000.

但是，大概集合中存在间隙，此时最好使用有效值列表(除非间隙数量相对较少，在这种情况下您可以使用:

However, presumably there are gaps in the set, at which point it may be better to go with the list of valid values after all (unless the gaps are relatively few in number, in which case you could use:

WHERE id BETWEEN 300 AND 3000 AND id NOT BETWEEN 742 AND 836

或者任何差距.

相关文章