带有子查询的 MySQL 优化查询

2022-01-23 00:00:00 subquery mysql query-optimization

今天我收到了来自我的托管帐户的电子邮件，说我需要调整我的查询:

选择`id`，`nick`，`msg`，`uid`，`show_pic`，`time`,`ip`,`time_updated`,(选择计数(c.msg_id)FROM `the_ans` c其中 c.msg_id = d.id) AS 计数器，(选择 c.msgFROM `the_ans` c哪里 c.msg_id=d.idORDER BY `time` DESC LIMIT 1) as lastmsg从`the_data` dORDER BY `time_updated` DESC LIMIT 26340 ,15

解释:

id select_type table type possible_keys key key_len ref rows Extra1 PRIMARY d ALL 34309 使用文件排序3 相关子查询 c ALL 43659 使用 where；使用文件排序2 相关子查询 c ALL 43659 使用 where

<块引用>

此查询检查 65,396,669,012,829 行，这在共享主机中是不可接受的.

tbh，我不明白他们的解释..查询的实际作用是按时间更新 15 个帖子，对于每个帖子，我都会获取最新评论，统计每个帖子的所有评论.

posts 表 - 'the_data'评论表='the_ans'

我不是 mysql 专家，我不知道如何改进这个查询任何帮助将不胜感激

谢谢

查询

选择`id`，`nick`，`msg`，`uid`，`show_pic`，`time`，`ip`，`time_updated`，(选择计数(c.msg_id)FROM `the_ans` c哪里 c.msg_id = d.id) 作为计数器，(选择 c.msgFROM `the_ans` c哪里 c.msg_id = d.idORDER BY `time` DESC限制 1) 作为最后一个消息FROM `the_data` dORDER BY `time_updated` DESC极限 26340 , 15

这是结果结构

id|尼克 |味精 |uid |show_pick |时间 |ip |time_updated|计数器|lastmsg|||||||||7 |jqman |你好|10074 |0 |2013-21-01 |12 |2013-21-01 |55 | 呸呸呸

解决方案

快速浏览一下解释计划，发现没有合适的索引供 MySQL 使用，所以它求助于全表扫描.

 解释:id select_type table type possible_keys key key_len ref rows Extra-- ------ ----- ---- ------------- --- ------- ---- ----- ----------------------------1 PRIMARY d ALL 34309 使用文件排序3 相关子查询 c ALL 43659 使用 where；使用文件排序2 相关子查询 c ALL 43659 使用 where

要优化现有查询的执行，您需要添加适当的索引.可能的候选人:

开启 `the_data`(`time_updated`)ON `the_ans`(`msg_id`,`time`)

这些索引将显着提高外部查询(可能消除排序操作)和相关子查询的大量执行的性能.

<小时>

除此之外，您还需要更改查询以提高性能.在准备好整个结果集之后，将应用最外层查询上的 LIMIT 子句，这意味着对于表 the_data 中的每一行都执行这两个相关的子查询.就性能而言，这会吃掉你的午餐.

要让这些相关子查询仅针对返回的(最多)15 行运行，您需要在运行这些子查询之前应用该 LIMIT 子句.

这个查询应该返回一个等效的结果集，并且将避免每个相关子查询的 34,000 多次执行，这应该会大大提高性能:

SELECT d.*，(选择计数(c.msg_id)FROM `the_ans` c哪里 c.msg_id = d.id) 作为计数器, ( 选择 c.msgFROM `the_ans` c哪里 c.msg_id = d.idORDER BY `time` DESC限制 1) 作为最后一个消息FROM(选择 e.`id`，例如`尼克`，例如`msg`，例如`uid`，例如`show_pic`，例如`时间`，例如`ip`，例如`time_updated`FROM `the_data` e命令BY e.`time_updated` DESC极限 26340 , 15) dORDER BY d.`time_updated` DESC

(您当前的查询执行每个相关子查询SELECT COUNT(1) FROM the_data"次.使用上面重写的查询，每个子查询将仅执行 15 次.)

Today i received email from my hosting account saying that i need to tweak my query:

SELECT
  `id`, `nick`, `msg`, `uid`, `show_pic`,
  `time`,`ip`,`time_updated`,
  (SELECT COUNT(c.msg_id)
   FROM `the_ans` c
   where c.msg_id = d.id) AS counter,
  (SELECT c.msg
   FROM `the_ans` c
   WHERE c.msg_id=d.id
   ORDER BY `time` DESC LIMIT 1) as lastmsg
FROM
  `the_data` d
ORDER BY `time_updated` DESC LIMIT 26340 ,15

EXPLAIN:

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY d ALL 34309 Using filesort
3 DEPENDENT SUBQUERY c ALL 43659 Using where; Using filesort
2 DEPENDENT SUBQUERY c ALL 43659 Using where

This query examines 65,396,669,012,829 rows, which is unacceptable in shared hosting.

tbh, i don't understand their explanation.. what the query actually does is to get 15 posts order by time updated, for each post i grab the latest comment, count all comments for each post.

posts table - 'the_data'

comments table = 'the_ans'

i'm not a mysql guru and i don't know how to improve this query any help will be appreciated

thx

the query

SELECT
  `id` , `nick` , `msg` , `uid` , `show_pic` , `time` , `ip` , `time_updated` , (
    SELECT COUNT( c.msg_id )
    FROM `the_ans` c
    WHERE c.msg_id = d.id
   ) AS counter, (
    SELECT c.msg
    FROM `the_ans` c
    WHERE c.msg_id = d.id
    ORDER BY `time` DESC
    LIMIT 1
   ) AS lastmsg
FROM `the_data` d
ORDER BY `time_updated` DESC
LIMIT 26340 , 15

this is the results structure

id| nick  | msg  | uid   | show_pick | time      | ip |time_updated|counter|lastmsg
  |       |      |       |           |           |    |            |       |
7 | jqman | hello| 10074 |   0       |2013-21-01 | 12 |2013-21-01  | 55    |blah bl

解决方案

A quick glance at the explain plan shows that there are no suitable indexes for MySQL to use, so it's resorting to full table scans.

 EXPLAIN: 
 id select_type        table type possible_keys key key_len ref rows  Extra 
 -- ------------------ ----- ---- ------------- --- ------- --- ----- ---------------------------- 
 1  PRIMARY            d     ALL                                34309 Using filesort
 3  DEPENDENT SUBQUERY c     ALL                                43659 Using where; Using filesort 
 2  DEPENDENT SUBQUERY c     ALL                                43659 Using where

To optimize the execution of the existing query, you need to add appropriate indexes. Likely candidates:

ON `the_data`(`time_updated`)
ON `the_ans`(`msg_id`,`time`)

Those indexes will significantly improve the performance of both the outer query (likely eliminating the sort operation), and the numerous executions of the correlated subqueries.

Beyond that, you're going to need to change the query to improve performance. The LIMIT clause on the outermost query is being applied after the entire resultset is prepared, which means those two correlated subqueries are getting executed for every row in table the_data. And that's going to eat your lunch, performance wise.

To have those correlated subqueries run only for the (up to) 15 rows that are being returned, you need to get that LIMIT clause applied before those subqueries get run.

This query should return an equivalent resultset, and will avoid 34,000+ executions of each correlated subquery, which should improve performance considerably:

SELECT d.*
     , ( SELECT COUNT( c.msg_id )
           FROM `the_ans` c
          WHERE c.msg_id = d.id
       ) AS counter
     , ( SELECT c.msg
           FROM `the_ans` c
          WHERE c.msg_id = d.id
          ORDER BY `time` DESC
          LIMIT 1
       ) AS lastmsg
  FROM ( SELECT e.`id` 
              , e.`nick`
              , e.`msg`
              , e.`uid`
              , e.`show_pic`
              , e.`time`
              , e.`ip`
              , e.`time_updated` 
           FROM `the_data` e
          ORDER
             BY e.`time_updated` DESC
          LIMIT 26340 , 15 
       ) d
 ORDER BY d.`time_updated` DESC

(Your current query executes each of those correlated subqueries "SELECT COUNT(1) FROM the_data" times. With the rewritten query above, each of those subqueries will be executed only 15 times.)

相关文章