MySQL 不使用带有 WHERE IN 子句的索引?
我正在尝试优化我的 Rails 应用程序中的一些数据库查询,但有几个问题让我感到困惑.它们都在 WHERE
子句中使用 IN
并且都在进行全表扫描,即使适当的索引似乎已经到位.
I'm trying to optimize some of the database queries in my Rails app and I have several that have got me stumped. They are all using an IN
in the WHERE
clause and are all doing full table scans even though an appropriate index appears to be in place.
例如:
SELECT `user_metrics`.* FROM `user_metrics` WHERE (`user_metrics`.user_id IN (N,N,N,N,N,N,N,N,N,N,N,N))
执行全表扫描,EXPLAIN
说:
select_type: simple
type: all
extra: using where
possible_keys: index_user_metrics_on_user_id (which is an index on the user_id column)
key: (none)
key_length: (none)
ref: (none)
rows: 208
在使用 IN
语句时是否不使用索引,或者我是否需要做一些不同的事情?这里的查询是由 Rails 生成的,因此我可以重新审视我的关系是如何定义的,但我想我首先要从数据库级别的潜在修复开始.
Are indexes not used when an IN
statement is used or do I need to do something differently? The queries here are being generated by Rails so I could revisit how my relationships are defined, but I thought I'd start with potential fixes at the DB level first.
推荐答案
参见 MySQL 如何使用索引.
还要验证 MySQL 是否仍然执行 全表在向 user_metrics
表中添加额外的 2000 行左右后扫描.在小表中,按索引访问实际上比表扫描更昂贵(I/O-wise),而且 MySQL 的优化器可能会考虑到这一点.
Also validate whether MySQL still performs a full table scan after you add an additional 2000-or-so rows to your user_metrics
table. In small tables, access-by-index is actually more expensive (I/O-wise) than a table scan, and MySQL's optimizer might take this into account.
与我之前的帖子相反,事实证明 MySQL 也在 使用一个基于成本的优化器,这是个好消息 - 也就是说,如果您认为数据库中的数据量具有代表性,则至少运行一次 ANALYZE
未来的日常使用.
Contrary to my previous post, it turns out that MySQL is also using a cost-based optimizer, which is very good news - that is, provided you run your ANALYZE
at least once when you believe that the volume of data in your database is representative of future day-to-day usage.
在处理基于成本的优化器(Oracle、Postgres 等)时,您需要确保定期运行 ANALYZE
在您的各种表格上,因为它们的大小增加了 10-15% 以上.(默认情况下,Postgres 会自动为你做这件事,而其他 RDBMS 会把这个责任留给 DBA,即你.)通过统计分析,ANALYZE
将帮助优化器更好地了解有多少在各种候选执行计划之间进行选择时,将涉及 I/O(和其他相关资源,例如 CPU,例如排序所需的).未能运行 ANALYZE
可能会导致非常糟糕的、有时是灾难性的规划决策(例如,由于 JOIN 上的错误嵌套循环,毫秒查询有时需要花费数小时代码>s.)
When dealing with cost-based optimizers (Oracle, Postgres, etc.), you need to make sure to periodically run ANALYZE
on your various tables as their size increases by more than 10-15%. (Postgres will do this automatically for you, by default, whereas other RDBMSs will leave this responsibility to a DBA, i.e. you.) Through statistical analysis, ANALYZE
will help the optimizer get a better idea of how much I/O (and other associated resources, such as CPU, needed e.g. for sorting) will be involved when choosing between various candidate execution plans. Failure to run ANALYZE
may result in very poor, sometimes disastrous planning decisions (e.g. millisecond-queries taking, sometimes, hours because of bad nested loops on JOIN
s.)
如果在运行 ANALYZE
后性能仍然不令人满意,那么您通常可以通过使用提示来解决该问题,例如FORCE INDEX
,而在其他情况下,您可能会偶然发现 MySQL 错误(例如这个 较旧的,如果您使用 Rails 的 nested_set
,这可能会让您感到厌烦.
If performance is still unsatisfactory after running ANALYZE
, then you will typically be able to work around the issue by using hints, e.g. FORCE INDEX
, whereas in other cases you might have stumbled over a MySQL bug (e.g. this older one, which could have bitten you were you to use Rails' nested_set
).
现在,由于您使用的是 Rails 应用程序,因此使用提示发出自定义查询而不是继续执行会很麻烦(并且违背了 ActiveRecord
的目的)使用 ActiveRecord
生成的.
Now, since you are in a Rails app, it will be cumbersome (and defeat the purpose of ActiveRecord
) to issue your custom queries with hints instead of continuing to use the ActiveRecord
-generated ones.
我曾提到在我们的 Rails 应用程序中所有 SELECT
查询在切换到 Postgres 后下降到 100 毫秒以下,而一些由 ActiveRecord 生成的复杂连接由于带有内部表扫描的嵌套循环,即使索引可用,代码>有时也会在 MySQL 5.1 中花费 15 秒或更多时间.没有优化器是完美的,您应该了解这些选项.除了查询计划优化之外,其他需要注意的潜在性能问题是锁定.不过,这超出了您的问题范围.
I had mentioned that in our Rails application all SELECT
queries dropped below 100ms after switching to Postgres, whereas some of the complex joins generated by ActiveRecord
would occasionally take as much as 15s or more with MySQL 5.1 because of nested loops with inner table scans, even when indices were available. No optimizer is perfect, and you should be aware of the options. Other potential performance issues to be aware of, besides query plan optimization, are locking. This is outside the scope of your problem though.
相关文章