加速 MySQL 中的行计数
假设,出于说明目的,您正在使用一个简单的 MySQLbooks"表运行一个库,该表包含三列:
Suppose, for illustrative purposes, you are running a library using a simple MySQL "books" table with three columns:
(id, title, status)
(id, title, status)
- id 是主键
- title 是书名
- status 可以是描述图书当前状态的枚举(例如 AVAILABLE、CHECKEDOUT、PROCESSING、MISSING)
- id is the primary key
- title is the title of the book
- status could be an enum describing the book's current state (e.g. AVAILABLE, CHECKEDOUT, PROCESSING, MISSING)
报告每个州有多少本书的简单查询是:
A simple query to report how many books fall into each state is:
SELECT status, COUNT(*) FROM books GROUP BY status
或专门查找有多少本书可用:
or to specifically find how many books are available:
SELECT COUNT(*) FROM books WHERE status = "AVAILABLE"
然而,一旦表增长到数百万行,这些查询需要几秒钟才能完成.向状态"列添加索引似乎对我的体验没有影响.
However, once the table grows to millions of rows, these queries take several seconds to complete. Adding an index to the "status" column doesn't appear to make a difference in my experience.
除了在每次图书更改状态(通过触发器或其他机制)时定期缓存结果或在单独的表中显式更新摘要信息之外,是否有任何技术可以加快此类查询的速度?似乎 COUNT 查询最终会查看每一行,而且(在不知道更多细节的情况下)我有点惊讶无法从索引中以某种方式确定这些信息.
Aside from periodically caching the results or explicitly updating summary info in a separate table each time a book changes state (via triggers or some other mechanism), are there any techniques for speeding up these kinds of queries? It seems that the COUNT queries end up looking at every row, and (without knowing more details) I'm a bit surprised that this information can't somehow be determined from the index.
更新
使用包含 200 万行的示例表(带有索引的状态"列),我对 GROUP BY 查询进行了基准测试.使用 InnoDB 存储引擎,查询在我的机器上需要 3.0 - 3.2 秒.使用 MyISAM,查询需要 0.9 - 1.1 秒.在这两种情况下,count(*)、count(status) 或 count(1) 之间没有显着差异.
Using the sample table (with an indexed "status" column) with 2 million rows, I benchmarked the GROUP BY query. Using the InnoDB storage engine, the query takes 3.0 - 3.2 seconds on my machine. Using MyISAM, the query takes 0.9 - 1.1 seconds. There was no significant difference between count(*), count(status), or count(1) in either case.
无可否认,MyISAM 的速度要快一些,但我很想知道是否有一种方法可以使等效查询运行更快(例如 10-50 毫秒 - 足够快以被调用低流量站点的每个网页请求),而无需缓存和触发器的心理开销.听起来答案是没有办法快速运行直接查询",这正是我所期望的 - 我只是想确保我没有错过一个简单的替代方案.
MyISAM is admittedly a bit faster, but I was curious to see if there was a way to make an equivalent query run much faster (e.g. 10-50 ms -- fast enough to be called on every webpage request for a low-traffic site) without the mental overhead of caching and triggers. It sounds like the answer is "there's no way to run the direct query quickly" which is what I expected - I just wanted to make sure I wasn't missing an easy alternative.
推荐答案
所以问题是
是否有任何技术可以加快此类查询的速度?
are there any techniques for speeding up these kinds of queries?
嗯,不是真的.对于那些 SELECT COUNT(*) 查询,基于列的存储引擎可能会更快,但对于几乎任何其他查询,它的性能都会降低.
Well, not really. A column-based storage engine would probably be faster with those SELECT COUNT(*) queries but it would be less performant for pretty much any other query.
最好的办法是通过触发器维护汇总表.它没有太多开销,无论表有多大,SELECT 部分都将是即时的.这是一些样板代码:
Your best bet is to maintain a summary table via triggers. It doesn't have much overhead and the SELECT part will be instantaneous no matter how big the table. Here's some boilerplate code:
DELIMITER //
CREATE TRIGGER ai_books AFTER INSERT ON books
FOR EACH ROW UPDATE books_cnt SET total = total + 1 WHERE status = NEW.status
//
CREATE TRIGGER ad_books AFTER DELETE ON books
FOR EACH ROW UPDATE books_cnt SET total = total - 1 WHERE status = OLD.status;
//
CREATE TRIGGER au_books AFTER UPDATE ON books
FOR EACH ROW
BEGIN
IF (OLD.status <> NEW.status)
THEN
UPDATE books_cnt SET total = total + IF(status = NEW.status, 1, -1) WHERE status IN (OLD.status, NEW.status);
END IF;
END
//
相关文章