MySQL 性能优化:按日期时间字段排序

2021-12-19 00:00:00 performance select mysql

我有一个包含大约 100.000 个博客帖子的表格,通过 1:n 关系链接到一个包含 50 个提要的表格.当我使用 select 语句查询两个表时,按发布表的日期时间字段排序,MySQL 总是使用文件排序,导致查询时间非常慢(> 1 秒).这是 postings 表的架构(简化):

+---------------------+--------------+------+-----+----------+----------------+|领域 |类型 |空 |钥匙 |默认 |额外 |+---------------------+--------------+------+-----+---------+----------------+|身份证 |整数(11) |否 |PRI |空 |自动增量||feed_id |整数(11) |否 |多 |空 |||crawl_date |日期时间 |否 ||空 |||is_active |tinyint(1) |否 |多 |0 |||链接 |varchar(255) |否 |多 |空 |||作者 |varchar(255) |否 ||空 |||标题 |varchar(255) |否 ||空 |||摘录|文字 |否 ||空 |||long_excerpt |文字 |否 ||空 |||user_offtopic_count |整数(11) |否 |多 |0 ||+---------------------+--------------+------+-----+---------+----------------+

这里是 feed 表:

+-------------+--------------+------+-----+---------+----------------+|领域 |类型 |空 |钥匙 |默认 |额外 |+--------------+--------------+------+-------+---------+--+|身份证 |整数(11) |否 |PRI |空 |自动增量||类型 |整数(11) |否 |多 |0 |||标题 |varchar(255) |否 ||空 |||网站 |varchar(255) |否 ||空 |||网址 |varchar(255) |否 ||空 ||+--------------+--------------+------+-------+---------+--+

这是执行时间超过 1 秒的查询.请注意 post_date 字段有一个索引,但 MySQL 没有使用它来对帖子表进行排序:

SELECT`帖子`.`id`,UNIX_TIMESTAMP(postings.post_date) 作为 post_date,`帖子`.`链接`,`postings`.`title`,`帖子`.`作者`,`帖子`.`摘录`,`postings`.`long_excerpt`,`feeds`.`title` AS feed_title,`feeds`.`website` AS feed_website从(`帖子`)加入`饲料`在`feeds`.`id` = `postings`.`feed_id`在哪里`feeds`.`type` = 1 AND`postings`.`user_offtopic_count` <10 与`postings`.`is_active` = 1订购者`postings`.`post_date` desc限制15

这个查询的explain extended命令的结果表明MySQL正在使用文件排序:

+----+-------------+----------+--------+---------------------------------------+-----------+----------+------------------------------+-------+----------------------+|身份证 |选择类型 |表|类型 |可能的密钥|键 |key_len |参考 |行 |额外 |+----+-------------+---------+--------+---------------------------------------+-----------+---------+----------------------------+-------+----------------------------+|1 |简单 |帖子|参考 |feed_id,is_active,user_offtopic_count |is_active |1 |常量 |30996 |使用哪里;使用文件排序 ||1 |简单 |饲料|eq_ref |主要,类型 |主要 |4 |feedian.postings.feed_id |1 |使用 where |+----+-------------+---------+--------+---------------------------------------+-----------+---------+----------------------------+-------+----------------------------+

当我删除 order by 部分时,MySQL 停止使用文件排序.如果您对如何优化此查询以让 MySQL 使用索引对数据进行排序和选择有任何想法,请告诉我.我已经尝试了一些方法,例如根据一些博客帖子的建议,在所有 where/order by 字段上创建组合索引,但这也不起作用.

解决方案

postings (is_active, post_date)(按此顺序)上创建复合索引.

它将用于过滤 is_active 和按 post_date 排序.

MySQL 应该在 EXPLAIN EXTENDED 中显示对该索引的 REF 访问方法.

请注意,您在 user_offtopic_count 上有一个 RANGE 过滤条件,这就是为什么您不能在过滤和按其他字段排序时对该字段使用索引.>

根据您的 user_offtopic_count 的选择性(即有多少行满足 user_offtopic_count <10),在 user_offtopic_count 上创建索引可能更有用 并让 post_dates 排序.

为此,请在 postings (is_active, user_offtopic_count) 上创建一个复合索引,并确保使用对该索引的 RANGE 访问方法.

哪个索引会更快取决于您的数据分布.创建两个索引,FORCE 它们,看看哪个更快:

CREATE INDEX ix_active_offtopic ON 发布 (is_active, user_offtopic_count);在帖子上创建索引 ix_active_date (is_active, post_date);选择`帖子`.`id`,UNIX_TIMESTAMP(postings.post_date) 作为 post_date,`帖子`.`链接`,`postings`.`title`,`帖子`.`作者`,`帖子`.`摘录`,`postings`.`long_excerpt`,`feeds`.`title` AS feed_title,`feeds`.`website` AS feed_website从`postings` FORCE INDEX (ix_active_offtopic)加入`饲料`在`feeds`.`id` = `postings`.`feed_id`在哪里`feeds`.`type` = 1 AND`postings`.`user_offtopic_count` <10 与`postings`.`is_active` = 1订购者`postings`.`post_date` desc限制15/* 这应该显示具有几行的 RANGE 访问并保留 FILESORT */选择`帖子`.`id`,UNIX_TIMESTAMP(postings.post_date) 作为 post_date,`帖子`.`链接`,`postings`.`title`,`帖子`.`作者`,`帖子`.`摘录`,`postings`.`long_excerpt`,`feeds`.`title` AS feed_title,`feeds`.`website` AS feed_website从`postings` FORCE INDEX (ix_active_date)加入`饲料`在`feeds`.`id` = `postings`.`feed_id`在哪里`feeds`.`type` = 1 AND`postings`.`user_offtopic_count` <10 与`postings`.`is_active` = 1订购者`postings`.`post_date` desc限制15/* 这应该显示具有大量行且没有 FILESORT 的 REF 访问 */

I have a table with roughly 100.000 blog postings, linked to a table with 50 feeds via an 1:n relationship. When I query both tables with a select statement, ordered by a datetime field of the postings table, MySQL always uses filesort, resulting in very slow query times (>1 second). Here's the schema of the postings table (simplified):

+---------------------+--------------+------+-----+---------+----------------+
| Field               | Type         | Null | Key | Default | Extra          |
+---------------------+--------------+------+-----+---------+----------------+
| id                  | int(11)      | NO   | PRI | NULL    | auto_increment |
| feed_id             | int(11)      | NO   | MUL | NULL    |                |
| crawl_date          | datetime     | NO   |     | NULL    |                |
| is_active           | tinyint(1)   | NO   | MUL | 0       |                |
| link                | varchar(255) | NO   | MUL | NULL    |                |
| author              | varchar(255) | NO   |     | NULL    |                |
| title               | varchar(255) | NO   |     | NULL    |                |
| excerpt             | text         | NO   |     | NULL    |                |
| long_excerpt        | text         | NO   |     | NULL    |                |
| user_offtopic_count | int(11)      | NO   | MUL | 0       |                |
+---------------------+--------------+------+-----+---------+----------------+

And here's the feed table:

+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int(11)      | NO   | PRI | NULL    | auto_increment |
| type        | int(11)      | NO   | MUL | 0       |                |
| title       | varchar(255) | NO   |     | NULL    |                |
| website     | varchar(255) | NO   |     | NULL    |                |
| url         | varchar(255) | NO   |     | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+

And here's the query that takes >1 second to execute. Please note that the post_date field has an index, but MySQL isn't using it to sort the postings table:

SELECT 
    `postings`.`id`, 
    UNIX_TIMESTAMP(postings.post_date) as post_date, 
    `postings`.`link`, 
    `postings`.`title`, 
    `postings`.`author`, 
    `postings`.`excerpt`, 
    `postings`.`long_excerpt`, 
    `feeds`.`title` AS feed_title, 
    `feeds`.`website` AS feed_website
FROM 
    (`postings`)
JOIN 
    `feeds` 
ON 
    `feeds`.`id` = `postings`.`feed_id`
WHERE 
    `feeds`.`type` = 1 AND 
    `postings`.`user_offtopic_count` < 10 AND 
    `postings`.`is_active` = 1
ORDER BY 
    `postings`.`post_date` desc
LIMIT 
    15  

The result of the explain extended command on this query shows that MySQL is using filesort:

+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
| id | select_type | table    | type   | possible_keys                         | key       | key_len | ref                      | rows  | Extra                       |
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
|  1 | SIMPLE      | postings | ref    | feed_id,is_active,user_offtopic_count | is_active | 1       | const                    | 30996 | Using where; Using filesort |
|  1 | SIMPLE      | feeds    | eq_ref | PRIMARY,type                          | PRIMARY   | 4       | feedian.postings.feed_id |     1 | Using where                 |
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+

When I remove the order by part, MySQL stops using filesort. Please let me know if you have any ideas on how to optimize this query to get MySQL to sort and select the data by using indexes. I have already tried a few things such as creating a combined index on all where/order by fields, as suggested by a few blog postings, but this didn't work either.

解决方案

Create a composite index either on postings (is_active, post_date) (in that order).

It will be used both for filtering on is_active and ordering by post_date.

MySQL should show REF access method over this index in EXPLAIN EXTENDED.

Note that you have a RANGE filtering condition over user_offtopic_count, that's why you cannot use an index over this field both in filtering and in sorting by other field.

Depending on how selective is your user_offtopic_count (i. e. how many rows satisfy user_offtopic_count < 10), it may be more useful to create an index on user_offtopic_count and let the post_dates be sorted.

To do this, create a composite index on postings (is_active, user_offtopic_count) and make sure the RANGE access method over this index is used.

Which index will be faster depends on your data distribuion. Create both indexes, FORCE them and see which is faster:

CREATE INDEX ix_active_offtopic ON postings (is_active, user_offtopic_count);
CREATE INDEX ix_active_date ON postings (is_active, post_date);

SELECT 
    `postings`.`id`, 
    UNIX_TIMESTAMP(postings.post_date) as post_date, 
    `postings`.`link`, 
    `postings`.`title`, 
    `postings`.`author`, 
    `postings`.`excerpt`, 
    `postings`.`long_excerpt`, 
    `feeds`.`title` AS feed_title, 
    `feeds`.`website` AS feed_website
FROM 
    `postings` FORCE INDEX (ix_active_offtopic)
JOIN 
    `feeds` 
ON 
    `feeds`.`id` = `postings`.`feed_id`
WHERE 
    `feeds`.`type` = 1 AND 
    `postings`.`user_offtopic_count` < 10 AND 
    `postings`.`is_active` = 1
ORDER BY 
    `postings`.`post_date` desc
LIMIT 
    15

/* This should show RANGE access with few rows and keep the FILESORT */

SELECT 
    `postings`.`id`, 
    UNIX_TIMESTAMP(postings.post_date) as post_date, 
    `postings`.`link`, 
    `postings`.`title`, 
    `postings`.`author`, 
    `postings`.`excerpt`, 
    `postings`.`long_excerpt`, 
    `feeds`.`title` AS feed_title, 
    `feeds`.`website` AS feed_website
FROM 
    `postings` FORCE INDEX (ix_active_date)
JOIN 
    `feeds` 
ON 
    `feeds`.`id` = `postings`.`feed_id`
WHERE 
    `feeds`.`type` = 1 AND 
    `postings`.`user_offtopic_count` < 10 AND 
    `postings`.`is_active` = 1
ORDER BY 
    `postings`.`post_date` desc
LIMIT 
    15

/* This should show REF access with lots of rows and no FILESORT */

相关文章