你能索引子查询吗?

2022-01-23 00:00:00 indexing subquery mysql

我有一个表格和一个查询,如下所示.有关工作示例,请参阅此 SQL Fiddle.

I have a table and a query that looks like below. For a working example, see this SQL Fiddle.

SELECT o.property_B, SUM(o.score1), w.score
FROM o
INNER JOIN 
(
    SELECT o.property_B, SUM(o.score2) AS score FROM o GROUP BY property_B
) w ON w.property_B = o.property_B
WHERE o.property_A = 'specific_A'
GROUP BY property_B;

使用我的真实数据,这个查询需要 27 秒.但是,如果我首先将 w 创建为临时表和索引 property_B,则总共需要大约 1 秒.

With my real data, this query takes 27 seconds. However, if I first create w as a temporary Table and index property_B, it all together takes ~1 second.

CREATE TEMPORARY TABLE w AS
SELECT o.property_B, SUM(o.score2) AS score FROM o GROUP BY property_B;

ALTER TABLE w ADD INDEX `property_B_idx` (property_B);

SELECT o.property_B, SUM(o.score1), w.score
FROM o
INNER JOIN w ON w.property_B = o.property_B
WHERE o.property_A = 'specific_A'
GROUP BY property_B;

DROP TABLE IF EXISTS w;

有没有一种方法可以结合这两个查询中最好的查询?IE.单个查询具有子查询中索引的速度优势?

Is there a way to combine the best of these two queries? I.e. a single query with the speed advantages of the indexing in the subquery?

在下面 Mehran 的回答之后,我在 MySQL 文档:

After Mehran's answer below, I read this piece of explanation in the MySQL documentation:

从 MySQL 5.6.3 开始,优化器更有效地处理 FROM 子句中的子查询(即派生表):

As of MySQL 5.6.3, the optimizer more efficiently handles subqueries in the FROM clause (that is, derived tables):

...

对于 FROM 子句中的子查询需要物化的情况,优化器可以通过向物化表添加索引来加快对结果的访问.如果这样的索引允许对表进行 ref 访问,它可以大大减少在查询执行期间必须读取的数据量.考虑以下查询:

For cases when materialization is required for a subquery in the FROM clause, the optimizer may speed up access to the result by adding an index to the materialized table. If such an index would permit ref access to the table, it can greatly reduce amount of data that must be read during query execution. Consider the following query:

SELECT * FROM t1
  JOIN (SELECT * FROM t2) AS derived_t2 ON t1.f1=derived_t2.f1;

如果这样做允许使用 ref 访问以实现最低成本的执行计划,则优化器会在从 derived_t2 的列 f1 上构造一个索引.添加索引后,优化器可以将物化派生表视为具有索引的普通表,并且它同样受益于生成的索引.与没有索引的查询执行成本相比,索引创建的开销可以忽略不计.如果 ref 访问会导致比其他访问方法更高的成本,则不会创建索引并且优化器不会丢失任何内容.

The optimizer constructs an index over column f1 from derived_t2 if doing so would permit the use of ref access for the lowest cost execution plan. After adding the index, the optimizer can treat the materialized derived table the same as a usual table with an index, and it benefits similarly from the generated index. The overhead of index creation is negligible compared to the cost of query execution without the index. If ref access would result in higher cost than some other access method, no index is created and the optimizer loses nothing.

推荐答案

首先你要知道,创建临时表绝对是一个可行的方案.但在某些情况下,没有其他选择适用,这在此处不正确!

First of all you need to know that creating a temporary table is absolutely a feasible solution. But in cases no other choice is applicable which is not true here!

在您的情况下,您可以像 FrankPl 指出的那样轻松提升您的查询,因为您的子查询和主查询都按同一字段分组.所以你不需要任何子查询.为了完整起见,我将复制并粘贴 FrankPl 的解决方案:

In your case, you can easily boost your query as FrankPl pointed out because your sub-query and main-query are both grouping by the same field. So you don't need any sub-queries. I'm going to copy and paste FrankPl's solution for the sake of completeness:

SELECT o.property_B, SUM(o.score1), SUM(o.score2)
FROM o
GROUP BY property_B;

但这并不意味着不可能遇到您希望可以索引子查询的情况.在这种情况下,您有两种选择,首先是使用您自己指出的临时表,保存子查询的结果.这个解决方案是有优势的,因为它被 MySQL 支持了很长时间.如果涉及大量数据,这是不可行的.

Yet it doesn't mean it's impossible to come across a scenario in which you wish you could index a sub-query. In which cases you've got two choices, first is using a temporary table as you pointed out yourself, holding the results of the sub-query. This solution is advantageous since it is supported by MySQL for a long time. It's just not feasible if there's a huge amount of data involved.

第二种解决方案是使用MySQL 5.6 或更高版本.在最新版本的 MySQL 中,新算法被合并,因此在子查询中使用的表上定义的索引也可以在子查询之外使用.

The second solution is using MySQL version 5.6 or above. In recent versions of MySQL new algorithms are incorporated so an index defined on a table used within a sub-query can also be used outside of the sub-query.

[更新]

对于问题的编辑版本,我会推荐以下解决方案:

For the edited version of the question I would recommend the following solution:

SELECT o.property_B, SUM(IF(o.property_A = 'specific_A', o.score1, 0)), SUM(o.score2)
FROM o
GROUP BY property_B
HAVING SUM(IF(o.property_A = 'specific_A', o.score1, 0)) > 0;

但是您需要处理 HAVING 部分.您可能需要根据您的实际问题进行更改.

But you need to work on the HAVING part. You might need to change it according to your actual problem.

相关文章