SQL 性能 UNION 与 OR

2021-11-20 00:00:00 performance sql union mysql

我刚刚阅读了一篇优化文章的一部分，并segfaulted 以下语句:

I just read part of an optimization article and segfaulted on the following statement:

当使用带有 UNION 和 OR 的 SQL 替换语句时:

When using SQL replace statements using OR with a UNION:

select username from users where company = ‘bbc’ or company = ‘itv’;

到:

select username from users where company = ‘bbc’ union select username from users where company = ‘itv’;

来自快速EXPLAIN:

使用OR:

使用UNION:

这是否意味着 UNION 可以把工作翻倍?

Doesn't this mean UNION does in double the work?

虽然我很欣赏 UNION 可能对某些 RDBMS 和某些表模式具有更高的性能，但正如作者建议的那样，这绝对正确.

While I appreciate UNION may be more performant for certain RDBMSes and certain table schemas, this is not categorically true as the author suggestions.

我错了吗?

推荐答案

要么您阅读的文章使用了不好的例子，要么您误解了他们的观点.

Either the article you read used a bad example, or you misinterpreted their point.

select username from users where company = 'bbc' or company = 'itv';

这相当于:

select username from users where company IN ('bbc', 'itv');

MySQL 可以使用 company 上的索引来进行此查询.没有必要做任何联合.

MySQL can use an index on company for this query just fine. There's no need to do any UNION.

更棘手的情况是您的 OR 条件涉及两个不同列.

The more tricky case is where you have an OR condition that involves two different columns.

select username from users where company = 'bbc' or city = 'London';

假设在 company 上有一个索引，在 city 上有一个单独的索引.鉴于 MySQL 在给定查询中通常每个表只使用一个索引，它应该使用哪个索引?如果它使用 company 上的索引，它仍然需要进行表扫描才能找到 city 是伦敦的行.如果它使用 city 上的索引，则必须对 company 为 bbc 的行进行表扫描.

Suppose there's an index on company and a separate index on city. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company, it would still have to do a table-scan to find rows where city is London. If it uses the index on city, it would have to do a table-scan for rows where company is bbc.

UNION 解决方案适用于这种情况.

The UNION solution is for this type of case.

select username from users where company = 'bbc' union select username from users where city = 'London';

现在每个子查询都可以使用索引进行搜索，子查询的结果由UNION组合而成.

Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION.

一位匿名用户提议对我上面的回答进行编辑，但版主拒绝了该编辑.它应该是评论，而不是编辑.提议的编辑声称 UNION 必须对结果集进行排序以消除重复的行.这使得查询运行速度变慢，因此索引优化是一种清洗.

An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.

我的回答是索引有助于在 UNION 发生之前将结果集减少到少量行.UNION 确实消除了重复，但要做到这一点，它只需要对小结果集进行排序.可能存在 WHERE 子句匹配表的重要部分的情况，并且 UNION 期间的排序与简单地执行表扫描一样昂贵.但更常见的是通过索引搜索减少结果集，因此排序的成本远低于表扫描.

My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.

差异取决于表中的数据和搜索的术语.确定给定查询的最佳解决方案的唯一方法是在 MySQL 查询分析器并比较它们的性能.

The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.

相关文章