SQL Server 中的 COUNT(*) 是恒定时间操作吗?如果没有,为什么不呢?
我正在另一篇文章中阅读此讨论 这个问题是其他人提出的.在阅读讨论之前,我一直认为 SQL Server(和其他 DBMS)会在元数据中的某个位置为每个表保留全局行数,但讨论似乎表明并非如此.为什么?Count(*)
(没有任何过滤)是这样一个常见的操作,如果它是 O(1) 会得到巨大的提升.即使不考虑 COUNT(*)
,表中的总行数也是如此基本的信息.他们为什么不做笔记?
I was reading this discussion in another post where this question was raised by someone else. Before reading the discussion, I always thought SQL Server (and other DBMS) keep a global count of rows for each table somewhere in the metadata, but the discussion seems to say it isn't so. Why? Count(*)
(without any filtering) being such a common operation would get huge boost if it is O(1). Even not considering COUNT(*)
, the total number of rows in a table is such a fundamental piece of information. Why don't they keep a note of it?
此外,为什么我们需要加载"整行(如我链接的帖子中所示)只是为了计算它们?索引或 PK 等不应该足以计算它们吗?
In addition, why do we need to "load" entire rows (as indicated in the post I linked) just to count them? Shouldn't indexes or PKs etc. be sufficient to count them?
推荐答案
不,COUNT(*)
不是常数时间操作.COUNT(*)
必须返回符合当前扫描谓词(即 WHERE
子句)的行数,以便单独返回元数据属性无效的.但是即使您没有谓词,COUNT 仍然必须满足当前的事务隔离语义,即.返回行数可见(例如,已提交).因此,COUNT
必须并且将在 SQL Server 中实际扫描和计数行.某些系统允许返回更快的估计"计数.
No, COUNT(*)
is not a constant time operation. A COUNT(*)
must return a count of rows that conform to the current scan predicate (ie. WHERE
clause), so that alone would make the return of a metadata property invalid. But even if you have no predicates, the COUNT still has to satisfy the current transaction isolation semantics, ie. return the count of rows visible (eg. committed). So COUNT
must, and will, in SQL Server, actually scan and count the rows. Some systems allow return of faster 'estimate' counts.
此外,作为旁注,依赖于 rowsviews/sys-partitions-transact-sql" rel="noreferrer">sys.partitions
不可靠.毕竟,如果这个计数能够保证准确,那么我们就不需要 DBCC UPDATEUSAGE(...) WITH COUNT_ROWS
.从历史上看,有几种情况会导致此计数器偏离实际情况(主要是最少记录的插入回滚),我所知道的都已修复,但这仍然存在以下问题:1)从具有错误的早期版本升级的表和 2) 其他尚未发现的错误.
Also, as a side comment, relying on rows
in sys.partitions
is unreliable. After all, if this count would be guaranteed accurate then we would not need DBCC UPDATEUSAGE(...) WITH COUNT_ROWS
. There are several scenarios that historically would cause this counter to drift apart from reality (mostly minimally logged insert rollbacks), all I know of are fixed, but that still leaves the problems of 1) upgraded tables from earlier versions that had the bugs and 2) other, not yet discovered, bugs.
此外,为什么我们需要加载"整行(如我链接的帖子中所示)只是为了计算它们?索引或 PK 等不应该足以计算它们吗?
In addition, why do we need to "load" entire rows (as indicated in the post I linked) just to count them? Shouldn't indexes or PKs etc. be sufficient to count them?
这不是 100% 正确.至少有两种情况不会加载整行":
This is not 100% true. There are at least 2 scenarios that do no 'load entire rows':
- 窄行存储索引仅加载索引"行,该行可能要小得多
- 列存储数据仅加载相关的列段
而且我上面所说的大部分内容都不适用于 Hekaton 表.
And most of what I say above do not apply for Hekaton tables.
相关文章