为什么这个简单的连接查询与子查询相比要快得多?
我有两张桌子.order_details
为 100,000 行,outbound
为 10,000 行.
I have two tables. order_details
which is 100,000 rows, and outbound
which is 10,000 rows.
我需要将它们加入一个名为 order_number
的列中,这两个列都是 VARCHAR(50).order_number 在出站表中不是唯一的.
I need to join them on a column called order_number
, which is a VARCHAR(50) on both. order_number is not unique in the outbound table.
CREATE TABLE `outbound` (
`outbound_id` int(12) NOT NULL,
`order_number` varchar(50) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `order_details` (
`order_details_id` int(12) NOT NULL,
`order_number` varchar(50) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
这是我的初始查询,运行时间超过 60 秒:
This is my initial query, and it takes well over 60 seconds to run:
SELECT o.order_number
FROM outbound o
INNER JOIN order_details od
ON o.order_number = od.order_number
此查询得到相同的结果,运行时间不到一秒:
This query gets the same results and takes less than a second to run:
SELECT o.order_number
FROM outbound o
INNER JOIN
(
SELECT order_number
FROM order_details
) od
ON (o.order_number = od.order_number)
这让我很惊讶,因为通常子查询要慢得多.
This is surprising to me because usually sub-queries are significantly slower.
运行 EXPLAIN
(我仍在学习如何理解)表明子查询版本使用 derived2
表,它正在使用索引,并且索引是 auto_key0
.我不够精明,不知道如何解释这一点以理解为什么这会产生重大影响.
Running EXPLAIN
(which I'm still learning how to understand) shows that the sub query version uses a derived2
table, that it is using an index, and that index is auto_key0
. I'm not savvy enough to know how to interpret this to understand why this makes a significant difference.
我正在通过命令行运行这些查询.
I am running these queries over command line.
我正在为 Linux (x86_64) CentOS 运行 MySQL Ver 14.14 Distrib 5.6.35.
I am running MySQL Ver 14.14 Distrib 5.6.35, for Linux (x86_64) CentOS.
总结:
为什么这个简单的连接查询使用子查询明显更快?
推荐答案
我对MySQL的了解非常有限.但这是我的想法:
My knowledge of MySQL is very limited. But these are my thoughts:
您的表没有索引.然后连接必须读取整个第二个表才能比较第一个表的每一行.
Your tables don't have indexes. Then the join has to read the entire second table in order to compare, for each row of the first table.
子查询读取第二张表一次并创建索引,则不需要为第一张表的每一行读取整个第二张表.它只需要检查索引,这要快得多.
The subquery reads the second table once and creates an index, then it doesn't need to read the entire second table for each row of the first table. It only has to check the index, which is much more faster.
要验证我是否正确,请尝试为两个表中的 order_number 列创建索引(CREATE INDEX ...),然后再次运行这两个查询.您的第一个查询应该只需要不到一秒钟而不是一分钟.
To verify if I'm ritght or not, try creating indexes for the column order_number in your two tables (CREATE INDEX ... ), and run again this two queries. Your first query should only take less than a second instead of a minute.
相关文章