“SELECT COUNT(*)"很慢，即使有 where 子句

2021-11-20 00:00:00 performance optimization mysql

我想弄清楚如何优化 MySQL 中非常慢的查询(我没有设计这个):

I'm trying to figure out how to optimize a very slow query in MySQL (I didn't design this):

SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391'; +----------+ | COUNT(*) | +----------+ | 3224022 | +----------+ 1 row in set (1 min 0.16 sec)

将其与完整计数进行比较:

Comparing that to a full count:

select count(*) from change_event; +----------+ | count(*) | +----------+ | 6069102 | +----------+ 1 row in set (4.21 sec)

解释语句在这里对我没有帮助:

The explain statement doesn't help me here:

explain SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: me type: range possible_keys: PRIMARY key: PRIMARY key_len: 8 ref: NULL rows: 4120213 Extra: Using where; Using index 1 row in set (0.00 sec)

好吧，它仍然认为它需要大约 400 万个条目来计数，但我可以比这更快地计算文件中的行数！我不明白为什么 MySQL 要花这么长时间.

OK, it still thinks it needs roughly 4 million entries to count, but I could count lines in a file faster than that! I don't understand why MySQL is taking this long.

这是表定义:

CREATE TABLE `change_event` ( `change_event_id` bigint(20) NOT NULL default '0', `timestamp` datetime NOT NULL, `change_type` enum('create','update','delete','noop') default NULL, `changed_object_type` enum('Brand','Broadcast','Episode','OnDemand') NOT NULL, `changed_object_id` varchar(255) default NULL, `changed_object_modified` datetime NOT NULL default '1000-01-01 00:00:00', `modified` datetime NOT NULL default '1000-01-01 00:00:00', `created` datetime NOT NULL default '1000-01-01 00:00:00', `pid` char(15) default NULL, `episode_pid` char(15) default NULL, `import_id` int(11) NOT NULL, `status` enum('success','failure') NOT NULL, `xml_diff` text, `node_digest` char(32) default NULL, PRIMARY KEY (`change_event_id`), KEY `idx_change_events_changed_object_id` (`changed_object_id`), KEY `idx_change_events_episode_pid` (`episode_pid`), KEY `fk_import_id` (`import_id`), KEY `idx_change_event_timestamp_ce_id` (`timestamp`,`change_event_id`), KEY `idx_change_event_status` (`status`), CONSTRAINT `fk_change_event_import` FOREIGN KEY (`import_id`) REFERENCES `import` (`import_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8

版本:

$ mysql --version mysql Ver 14.12 Distrib 5.0.37, for pc-solaris2.8 (i386) using readline 5.0

有什么明显的我遗漏了吗?(是的，我已经尝试过SELECT COUNT(change_event_id)"，但没有性能差异).

Is there something obvious I'm missing? (Yes, I've already tried "SELECT COUNT(change_event_id)", but there's no performance difference).

推荐答案

InnoDB 使用聚集主键，因此主键与行一起存储在数据页中，而不是在单独的索引页中.为了进行范围扫描，您仍然必须扫描数据页中所有可能宽的行；请注意，此表包含一个 TEXT 列.

InnoDB uses clustered primary keys, so the primary key is stored along with the row in the data pages, not in separate index pages. In order to do a range scan you still have to scan through all of the potentially wide rows in data pages; note that this table contains a TEXT column.

我会尝试两件事:

运行优化表.这将确保数据页按排序顺序物理存储.可以想象，这可以加快对集群主键的范围扫描.
仅在 change_event_id 列上创建一个额外的非主索引.这将在索引页中存储该列的副本，扫描速度要快得多.创建后，检查解释计划以确保它使用新索引.

run optimize table. This will ensure that the data pages are physically stored in sorted order. This could conceivably speed up a range scan on a clustered primary key.

create an additional non-primary index on just the change_event_id column. This will store a copy of that column in index pages which be much faster to scan. After creating it, check the explain plan to make sure it's using the new index.

(如果 change_event_id 列从零开始递增，您可能还希望将其设为 bigint unsigned)

(you also probably want to make the change_event_id column bigint unsigned if it's incrementing from zero)

相关文章