为什么 IN 条件会比“="慢?在 sql?

2022-01-30 00:00:00 performance sql comparison mysql

检查问题此 SELECT 查询需要 180 秒才能完成(检查问题本身的评论).
IN 只能与一个值进行比较，但时间差异仍然很大.
为什么会这样?

解决方案

总结:这是一个MySQL 中的>已知问题，并在 MySQL 5.6.x 中修复.问题是由于当使用 IN 的子查询被错误地识别为依赖子查询而不是独立子查询时缺少优化.

<小时>

当您对原始查询运行 EXPLAIN 时，它会返回:

<上一页>1 'PRIMARY' 'question_law_version' 'ALL' '' '' '' '' 10148 '在哪里使用'2 'DEPENDENT SUBQUERY' 'question_law_version' 'ALL' '' '' '' '' 10148 '使用where'3 'DEPENDENT SUBQUERY' 'question_law' 'ALL' '' '' '' '' 10040 '使用where'

当您将 IN 更改为 = 时，您会得到:

<上一页>1 'PRIMARY' 'question_law_version' 'ALL' '' '' '' '' 10148 '在哪里使用'2 'SUBQUERY' 'question_law_version' 'ALL' '' '' '' '' 10148 '在哪里使用'3 'SUBQUERY' 'question_law' 'ALL' '' '' '' '' 10040 '使用where'

每个相关子查询在它所在的查询中的每一行运行一次，而子查询只运行一次.当存在可以转换为连接的条件时，MySQL 有时可以优化依赖子查询，但这里并非如此.

现在这当然留下了为什么 MySQL 认为 IN 版本需要是依赖子查询的问题.我制作了一个简化版本的查询来帮助调查这个问题.我创建了两个表foo"和bar"，前者只包含一个 id 列，后者包含一个 id 和一个 foo id(尽管我没有创建外键约束).然后我用 1000 行填充了两个表:

CREATE TABLE foo (id INT PRIMARY KEY NOT NULL);CREATE TABLE bar (id INT PRIMARY KEY, foo_id INT NOT NULL);-- 在每个表中填充 1000 行选择编号来自 foo身份在哪里(选择最大(foo_id)从栏);

这个简化的查询和以前有同样的问题——内部选择被视为依赖子查询并且不执行优化，导致内部查询每行运行一次.查询需要将近一秒钟的时间来运行.再次将 IN 更改为 = 可使查询几乎立即运行.

我用来填充表格的代码如下，以防有人希望重现结果.

CREATE TABLE 填充 (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT) 引擎=内存；分隔符 $$创建过程 prc_filler(cnt INT)开始声明_cnt INT；设置_cnt = 1；WHILE _cnt <= cnt DO插入INTO填料选择_cnt；设置_cnt = _cnt + 1；结束；结尾$$分隔符；调用 prc_filler(1000);INSERT foo SELECT id FROM 填充；INSERT bar SELECT id, id FROM 填充；

Check the question This SELECT query takes 180 seconds to finish (check the comments on the question itself).
The IN get to be compared against only one value, but still the time difference is enormous.
Why is it like that?

解决方案

Summary: This is a known problem in MySQL and was fixed in MySQL 5.6.x. The problem is due to a missing optimization when a subquery using IN is incorrectly indentified as dependent subquery instead of an independent subquery.

When you run EXPLAIN on the original query it returns this:

1  'PRIMARY'             'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where'
2  'DEPENDENT SUBQUERY'  'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where'
3  'DEPENDENT SUBQUERY'  'question_law'          'ALL'  ''  ''  ''  ''  10040  'Using where'

When you change IN to = you get this:

1  'PRIMARY'   'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where'
2  'SUBQUERY'  'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where'
3  'SUBQUERY'  'question_law'          'ALL'  ''  ''  ''  ''  10040  'Using where'

Each dependent subquery is run once per row in the query it is contained in, whereas the subquery is run only once. MySQL can sometimes optimize dependent subqueries when there is a condition that can be converted to a join but here that is not the case.

Now this of course leaves the question of why MySQL believes that the IN version needs to be a dependent subquery. I have made a simplified version of the query to help investigate this. I created two tables 'foo' and 'bar' where the former contains only an id column, and the latter contains both an id and a foo id (though I didn't create a foreign key constraint). Then I populated both tables with 1000 rows:

CREATE TABLE foo (id INT PRIMARY KEY NOT NULL);
CREATE TABLE bar (id INT PRIMARY KEY, foo_id INT NOT NULL);

-- populate tables with 1000 rows in each

SELECT id
FROM foo
WHERE id IN
(
    SELECT MAX(foo_id)
    FROM bar
);

This simplified query has the same problem as before - the inner select is treated as a dependent subquery and no optimization is performed, causing the inner query to be run once per row. The query takes almost one second to run. Changing the IN to = again allows the query to run almost instantly.

The code I used to populate the tables is below, in case anyone wishes to reproduce the results.

CREATE TABLE filler (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt <= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

CALL prc_filler(1000);

INSERT foo SELECT id FROM filler;
INSERT bar SELECT id, id FROM filler;

相关文章