什么是聚集索引表?
我可能是错的,但互联网上似乎对这些是什么有不同的看法.SQL Server、MySQL、DB2 和 PostgreSQL 对这些表有不同的定义.
I may be wrong, but it seems to be different opinions in the interwebs about what these are. SQL Server, MySQL, DB2, and PostgreSQL show different definitions for these tables.
在阅读了大量来自不同供应商(数据库手册、用户帖子等)的信息后,我能够区分三种类型的感兴趣的表(对于这个问题,有很多很多类型不感兴趣).请多多包涵:
After reading a ton from different vendors (database manuals, user posts, etc.) I was able to distinguish three types of tables of interest (there are many, many more types of no interest for this question). Please bear with me:
堆表:
- 所有行都存储在堆表中(可能是无序的).
- 每一行都有一个内部 ROWID 来标识它.
- 索引是可选的.如果添加,它们会将索引列作为索引键加上 ROWID(最终访问堆中的实际行).
- 注意:这个案例对这个问题没有兴趣,但我在这里添加它是为了与下面的第三种案例有所不同.
- All rows are stored (probably unordered) in the heap table.
- Each row has an internal ROWID that identifies it.
- Indexes are optional. If added, they include the indexed columns as the index key, plus the ROWID (to eventually access the real rows in the heap).
- Note: this case is of no interest for this question, but I added it here to make a difference with the third case below.
纯索引表:<-- 这是聚集索引表吗?
Pure Index Table: <-- Is this a Clustered Index Table?
- 有一个主索引包含键列以及其中的非键列.所有数据都存储在索引中.
- 数据遵循主索引顺序,因此按定义按主索引排序.
- 无需堆表来存储行.所有数据都已在索引中.没有任何 ROWID,因为没有堆表.
- SQL Server 表(通常)默认属于此类别.
- MySQL InnoDB 表似乎也属于这一类,因为它们似乎根本没有堆表.
- There's one main index that includes the key columns, as well as the non-key columns in it. All the data is stored in the index.
- The data follows the main index order, so it's by definition sorted by the main index.
- There's no need for a heap table to store the rows. All data is already in the index. There's no ROWID whatsoever, since there's no heap table.
- SQL Server tables (typically) fall by default in this category.
- MySQL InnoDB tables seem to also fall in this category since they don't seem to have a heap table at all.
索引 + 排序堆表:<-- 这是一个聚集索引表吗?
Index + Sorted Heap Table: <-- Is this a Clustered Index Table?
- 有一个主要的聚集索引".
- 有一个堆表,其中的行按照聚集索引定义的顺序存储.
- 堆表中的每一行都有一个 ROWID.
- 聚集索引不包括非键列,而是一个 ROWID 用于访问堆表中的实际行.
- DB2 似乎能够集群"表.
- PostgreSQL 似乎也将这些表称为集群ing索引"表.
- There's one main "clustered index".
- There's a heap table where the rows are stored in the order defined by the clustered index.
- Each row in the heap table has a ROWID.
- The clustered index does not include non-key columns, but a ROWID to access the real row in the heap table.
- DB2 seems to be able to "Cluster" tables.
- PostgreSQL seems to also call these tables as "Clustering Index" tables.
现在,这些#2 或#3 中的哪一个是聚集索引表"?谁在说真话,谁在撒谎?:D
Now, which ones of these #2 or #3 is a "Clustered Index Table"? Who's telling the truth and who's lying? :D
也就是说,Clustered Index Table"这个术语是每个供应商自由定义的商业术语,还是根据一些官方数据库理论有一个官方唯一的定义?
In other words, is the term "Clustered Index Table" a commercial term that each vendor freely defines as he/she pleases, or is there an official unique definition according to some official database theory?
推荐答案
据我所知,聚集索引"是一种索引,其叶子节点是数据页.这与非聚集索引不同,非聚集索引的叶节点是对存储在数据页上的行的引用.
As far as I know, a "clustered index" is an index where the leaf nodes of the index are the data pages. This is different from a non-clustered index where the leaf nodes are references to rows stored on the data pages.
一张表最多可以有一个聚集索引.在具有聚集索引的表中,数据按索引键排序.
A table can have at most one clustered index. In a table with a clustered index, the data is sorted on by the index keys.
Postgres 不支持聚集索引.它确实有一个名为 "cluster" 的表优化基于索引的数据.但是这个顺序是不维护的,所以和聚集索引是不一样的.
Postgres does not support clustered indexes. It does have a table optimization called "cluster" that sorts the data based on an index. However, this ordering is not maintained, so it is not the same as a clustered index.
相关文章