什么是聚集索引表?

2022-01-14 00:00:00 postgresql database db2 mysql sql-server

我可能是错的,但互联网上似乎对这些是什么有不同的看法.SQL Server、MySQL、DB2 和 PostgreSQL 对这些表有不同的定义.

I may be wrong, but it seems to be different opinions in the interwebs about what these are. SQL Server, MySQL, DB2, and PostgreSQL show different definitions for these tables.

在阅读了大量来自不同供应商(数据库手册、用户帖子等)的信息后,我能够区分三种类型的感兴趣的表(对于这个问题,有很多很多类型不感兴趣).请多多包涵:

After reading a ton from different vendors (database manuals, user posts, etc.) I was able to distinguish three types of tables of interest (there are many, many more types of no interest for this question). Please bear with me:

  1. 堆表:

  • 所有行都存储在堆表中(可能是无序的).
  • 每一行都有一个内部 ROWID 来标识它.
  • 索引是可选的.如果添加,它们会将索引列作为索引键加上 ROWID(最终访问堆中的实际行).
  • 注意:这个案例对这个问题没有兴趣,但我在这里添加它是为了与下面的第三种案例有所不同.
  • All rows are stored (probably unordered) in the heap table.
  • Each row has an internal ROWID that identifies it.
  • Indexes are optional. If added, they include the indexed columns as the index key, plus the ROWID (to eventually access the real rows in the heap).
  • Note: this case is of no interest for this question, but I added it here to make a difference with the third case below.

纯索引表:<-- 这是聚集索引表吗?

Pure Index Table: <-- Is this a Clustered Index Table?

  • 有一个主索引包含键列以及其中的非键列.所有数据都存储在索引中.
  • 数据遵循主索引顺序,因此按定义按主索引排序.
  • 无需堆表来存储行.所有数据都已在索引中.没有任何 ROWID,因为没有堆表.
  • SQL Server 表(通常)默认属于此类别.
  • MySQL InnoDB 表似乎也属于这一类,因为它们似乎根本没有堆表.
  • There's one main index that includes the key columns, as well as the non-key columns in it. All the data is stored in the index.
  • The data follows the main index order, so it's by definition sorted by the main index.
  • There's no need for a heap table to store the rows. All data is already in the index. There's no ROWID whatsoever, since there's no heap table.
  • SQL Server tables (typically) fall by default in this category.
  • MySQL InnoDB tables seem to also fall in this category since they don't seem to have a heap table at all.

索引 + 排序堆表:<-- 这是一个聚集索引表吗?

Index + Sorted Heap Table: <-- Is this a Clustered Index Table?

  • 有一个主要的聚集索引".
  • 有一个堆表,其中的行按照聚集索引定义的顺序存储.
  • 堆表中的每一行都有一个 ROWID.
  • 聚集索引不包括非键列,而是一个 ROWID 用于访问堆表中的实际行.
  • DB2 似乎能够集群"表.
  • PostgreSQL 似乎也将这些表称为集群ing索引"表.
  • There's one main "clustered index".
  • There's a heap table where the rows are stored in the order defined by the clustered index.
  • Each row in the heap table has a ROWID.
  • The clustered index does not include non-key columns, but a ROWID to access the real row in the heap table.
  • DB2 seems to be able to "Cluster" tables.
  • PostgreSQL seems to also call these tables as "Clustering Index" tables.

现在,这些#2 或#3 中的哪一个是聚集索引表"?谁在说真话,谁在撒谎?:D

Now, which ones of these #2 or #3 is a "Clustered Index Table"? Who's telling the truth and who's lying? :D

也就是说,Clustered Index Table"这个术语是每个供应商自由定义的商业术语,还是根据一些官方数据库理论有一个官方唯一的定义?

In other words, is the term "Clustered Index Table" a commercial term that each vendor freely defines as he/she pleases, or is there an official unique definition according to some official database theory?

推荐答案

据我所知,聚集索引"是一种索引,其叶子节点是数据页.这与非聚集索引不同,非聚集索引的叶节点是对存储在数据页上的行的引用.

As far as I know, a "clustered index" is an index where the leaf nodes of the index are the data pages. This is different from a non-clustered index where the leaf nodes are references to rows stored on the data pages.

一张表最多可以有一个聚集索引.在具有聚集索引的表中,数据按索引键排序.

A table can have at most one clustered index. In a table with a clustered index, the data is sorted on by the index keys.

Postgres 不支持聚集索引.它确实有一个名为 "cluster" 的表优化基于索引的数据.但是这个顺序是不维护的,所以和聚集索引是不一样的.

Postgres does not support clustered indexes. It does have a table optimization called "cluster" that sorts the data based on an index. However, this ordering is not maintained, so it is not the same as a clustered index.

相关文章