优化 mysql 查询(喜欢/不喜欢)

2022-01-23 00:00:00 optimization sql count subquery mysql

我的网站包含用户可以投票的内容片段(喜欢/不喜欢类似于 reddit upvotes).在选择单个内容时,我运行以下子查询来获取喜欢的数量、不喜欢的数量和当前用户的投票.

My website contains pieces of content on which users can vote (like/dislike similar to reddit upvotes). When selecting an individual piece of content, I run the following subqueries to get the number of likes, the number of dislikes and the current user's vote.

投票存储在单独的表中 {contentId, userId, vote}

The votes are stored in a separate table {contentId, userId, vote}

SELECT

[... BUNCH OF FIELDS ...]

(SELECT COUNT(*) FROM votes vt WHERE vt.cId = c.contentId AND vote = '.Constants::LIKE.') AS likes,
(SELECT COUNT(*) FROM votes vt WHERE vt.cId = c.contentId AND vote = '.Constants::DISLIKE.') AS dislikes,
COALESCE((SELECT vote FROM votes vt WHERE vt.cId = c.contentId AND userId = '.USER_ID.'), '.Constants::NO_VOTE.') AS myVote

FROM content

[... OTHER STUFF ... ]

有没有更好的方法来实现这一点(组合这些子查询或其他方式)?

Is there a better way to achieve this (combine those subqueries or otherwise)?

推荐答案

在性能方面,那些相关的子查询可以吃掉你的午餐.并且由于 MySQL 处理它们的方式,也吞噬了你的午餐盒,对于大型集.这些子查询中的每一个都会针对外部查询中返回的每一行执行.对于大型套装来说,这可能会变得非常昂贵.

In terms of performance, those correlated subqueries can eat your lunch. And devour your lunchbox too, for large sets, because of the way MySQL processes them. Each of those subqueries gets executed for every row returned in the outer query. And that can get very expensive for large sets.

另一种方法是使用内联视图来具体化所有内容的好恶,然后对其执行连接操作.

An alternative approach is to use an inline view to materialize the likes and dislikes for all content, and then do a join operation to that.

但是,这种方法也可能代价高昂,尤其是当您只需要对无数行中的几行内容进行计数"时.通常,外部查询的谓词也可以合并到内联视图中,以限制需要检查和返回的行数.

But, this approach can be expensive too, particularly when you are only needing the vote "counts" for just a few content rows, out of a bazillion rows. Often, there is a predicate from the outer query that can also be incorporated into the inline view, to limit the number of rows that need to be examined and returned.

我们想对该内联视图使用 OUTER 连接,因此它会返回与您的查询等效的结果;当 vote 表中没有匹配的行时,从 content 返回一行.

We want to use an OUTER join to that inline view, so it returns a result equivalent to your query; returning a row from content when there are no matching rows in the vote table.

SELECT [... BUNCH OF FIELDS ...]
     , COALESCE(v.likes,0) AS likes
     , COALESCE(v.dislikes,0) AS dislikes
     , COALESCE(v.myvote,'.Constants::NO_VOTE.') AS myvote
  FROM content c
  LEFT
  JOIN ( SELECT vt.cId
              , SUM(vt.vote = '.Constants::LIKE.') AS likes
              , SUM(vt.vote = '.Constants::DISLIKE.') AS dislikes
              , MAX(IF(vt.userId = '.USER_ID.',vt.vote,NULL)) AS myvote
           FROM votes vt
          GROUP
             BY vt.cId
       ) v
    ON v.cId = c.contentId

       [... OTHER STUFF ... ]

请注意,内联视图查询(别名为 v)将查看 votes 表中的每一行.如果您只需要一个子集,请考虑添加一个适当的谓词(在 WHERE 子句中或作为另一个表的 JOIN).您的查询中的 [... OTHER STUFF ...] 没有任何迹象表明它只是从 content 返回几行,还是您需要所有行因为你是通过likes等来订购的.

Note that the inline view query (aliased as v) is going to look at EVERY single row from the votes table. If you only need a subset, then consider adding an appropriate predicate (either in a WHERE clause or as a JOIN to another table). There's no indication from the [... OTHER STUFF ...] in your query whether it's returning just a few rows from content or if you are needing all of the rows because you are ordering by likes, etc.

对于从 content 表中选择的少量行,使用相关子查询(如在您的查询中)实际上比实现一个巨大的内联视图并对其执行连接操作要快.

For a small number of rows selected from the content table, using the correlated subqueries (like in your query) can actually be faster than materializing a huge inline view and performing a join operation against it.

哦...对于这两个查询,不言而喻,votes 表上具有 cId 前导列的适当索引将提高性能.对于内联视图,您不希望 MySQL 必须对所有这些行执行 filesort 操作来执行 GROUP BY.对于相关子查询,您希望它们使用索引范围扫描,而不是完整扫描.

Oh... and for both queries, it goes without saying that an appropriate index on the votes table with a leading column of cId will benefit performance. For the inline view, you don't want the overhead of MySQL having to perform a filesort operation on all of those rows to do the GROUP BY. And for the correlated subqueries, you want them to use a index range scan, not a full scan.

相关文章