什么是聚集索引表?

2022-01-14 00:00:00 postgresql database db2 mysql sql-server

我可能是错的，但互联网上似乎对这些是什么有不同的看法.SQL Server、MySQL、DB2 和 PostgreSQL 对这些表有不同的定义.

I may be wrong, but it seems to be different opinions in the interwebs about what these are. SQL Server, MySQL, DB2, and PostgreSQL show different definitions for these tables.

在阅读了大量来自不同供应商(数据库手册、用户帖子等)的信息后，我能够区分三种类型的感兴趣的表(对于这个问题，有很多很多类型不感兴趣).请多多包涵:

After reading a ton from different vendors (database manuals, user posts, etc.) I was able to distinguish three types of tables of interest (there are many, many more types of no interest for this question). Please bear with me:

堆表:

所有行都存储在堆表中(可能是无序的).
每一行都有一个内部 ROWID 来标识它.
索引是可选的.如果添加，它们会将索引列作为索引键加上 ROWID(最终访问堆中的实际行).
注意:这个案例对这个问题没有兴趣，但我在这里添加它是为了与下面的第三种案例有所不同.

All rows are stored (probably unordered) in the heap table.

Each row has an internal ROWID that identifies it.

Indexes are optional. If added, they include the indexed columns as the index key, plus the ROWID (to eventually access the real rows in the heap).

Note: this case is of no interest for this question, but I added it here to make a difference with the third case below.

纯索引表:<-- 这是聚集索引表吗?

Pure Index Table: <-- Is this a Clustered Index Table?

有一个主索引包含键列以及其中的非键列.所有数据都存储在索引中.
数据遵循主索引顺序，因此按定义按主索引排序.
无需堆表来存储行.所有数据都已在索引中.没有任何 ROWID，因为没有堆表.
SQL Server 表(通常)默认属于此类别.
MySQL InnoDB 表似乎也属于这一类，因为它们似乎根本没有堆表.

There's one main index that includes the key columns, as well as the non-key columns in it. All the data is stored in the index.

The data follows the main index order, so it's by definition sorted by the main index.

There's no need for a heap table to store the rows. All data is already in the index. There's no ROWID whatsoever, since there's no heap table.

SQL Server tables (typically) fall by default in this category.

MySQL InnoDB tables seem to also fall in this category since they don't seem to have a heap table at all.

索引 + 排序堆表:<-- 这是一个聚集索引表吗?

Index + Sorted Heap Table: <-- Is this a Clustered Index Table?

有一个主要的聚集索引".
有一个堆表，其中的行按照聚集索引定义的顺序存储.
堆表中的每一行都有一个 ROWID.
聚集索引不包括非键列，而是一个 ROWID 用于访问堆表中的实际行.
DB2 似乎能够集群"表.
PostgreSQL 似乎也将这些表称为集群ing索引"表.

There's one main "clustered index".

There's a heap table where the rows are stored in the order defined by the clustered index.

Each row in the heap table has a ROWID.

The clustered index does not include non-key columns, but a ROWID to access the real row in the heap table.

DB2 seems to be able to "Cluster" tables.

PostgreSQL seems to also call these tables as "Clustering Index" tables.

现在，这些#2 或#3 中的哪一个是聚集索引表"?谁在说真话，谁在撒谎?:D

Now, which ones of these #2 or #3 is a "Clustered Index Table"? Who's telling the truth and who's lying? :D

也就是说，Clustered Index Table"这个术语是每个供应商自由定义的商业术语，还是根据一些官方数据库理论有一个官方唯一的定义?

In other words, is the term "Clustered Index Table" a commercial term that each vendor freely defines as he/she pleases, or is there an official unique definition according to some official database theory?

推荐答案

据我所知，聚集索引"是一种索引，其叶子节点是数据页.这与非聚集索引不同，非聚集索引的叶节点是对存储在数据页上的行的引用.

As far as I know, a "clustered index" is an index where the leaf nodes of the index are the data pages. This is different from a non-clustered index where the leaf nodes are references to rows stored on the data pages.

一张表最多可以有一个聚集索引.在具有聚集索引的表中，数据按索引键排序.

A table can have at most one clustered index. In a table with a clustered index, the data is sorted on by the index keys.

Postgres 不支持聚集索引.它确实有一个名为 "cluster" 的表优化基于索引的数据.但是这个顺序是不维护的，所以和聚集索引是不一样的.

Postgres does not support clustered indexes. It does have a table optimization called "cluster" that sorts the data based on an index. However, this ordering is not maintained, so it is not the same as a clustered index.

相关文章