包含列的索引,有什么区别?
我从来没有真正理解这两个索引之间的区别,谁能解释一下区别是什么(性能方面,索引结构在数据库中的外观,存储方面等)?
I've never really understood the difference between these two indexes, can someone please explain what the difference is (performance-wise, how the index structure will look like in db, storage-wise etc)?
包含索引
CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID);
'普通'索引
CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode, AddressLine1, AddressLine2, City, StateProvinceID);
推荐答案
索引的内部存储采用 B-Tree 结构,由索引页"(根页和所有中间页)和索引数据页"(仅叶页).
The internal storage of indexes uses a B-Tree structure and consists of "index pages" (the root and all intermediate pages) and "index data pages" (the leaf pages only).
注意不要将索引数据页"与存储大部分实际数据列的数据页"(聚集索引的叶页)混淆.
Note do not confuse "index data pages" with the "data pages" (leaf pages of clustered indexes) which store most of the columns of actual data.
- 只有索引列存储在索引页上.
- 通过在
INCLUDE
部分放置一些列,每个索引键存储在每个页面上的数据更少. - 意味着需要更少的页面来保存索引键.(更轻松地将这些常用页面缓存在内存中更长时间.)
- 树中的级别可能更少.(在这种情况下,性能优势会更大,因为每个树级别遍历都是另一次磁盘访问.)
- Only the index columns are stored on the index pages.
- By placing some columns in the
INCLUDE
section, less data per index key is stored on each page. - Meaning fewer pages are needed to hold the index keys. (Making it easier to cache these frequently used pages in memory for longer.)
- And possibly fewer levels in the tree. (In such a case performance benefits can be much bigger because every tree level traversal is another disk access.)
- 如果索引具有
INCLUDE
列,则该数据在查询需要时立即可用. - 如果查询需要在索引键或
INCLUDE
列中不可用的列,则需要对聚集索引中的正确行(或堆,如果没有聚集索引)进行额外的书签查找"已定义索引). - If the index has
INCLUDE
columns, that data is immediately available should the query need it. - If the query requires columns not available in either the index keys or the
INCLUDE
columns, then an additional "bookmark lookup" is required to the correct row in the clustered index (or heap if no clustered index defined). - 如果您的索引的键和查询中的过滤器选择性不够,那么该索引将被忽略(无论您的
INCLUDE
列中有什么内容). - 您创建的每个索引都有 INSERT 和 UPDATE 语句的开销;对于更大"的索引更是如此.(更大的也适用于
INCLUDE
列.) - 因此,虽然理论上您可以创建大量包含列的大索引以匹配访问路径的所有排列:这会适得其反.
- If the keys of your index and filters in your query are not selective enough, then the index will be ignored (regardless of what's in your
INCLUDE
columns). - Every index you create has overhead for INSERT and UPDATE statements; more so for "bigger" indexes. (Bigger applies to
INCLUDE
columns as well.) - So while you could in theory create a multitude of big indexes with include columns to match all the permutations of access paths: it would be very counter-productive.
- 扩展索引的键以包含索引/过滤器中不需要的列是一种常见的索引调整技巧".(称为覆盖索引.)
- 这些列通常在输出列中需要,或者作为连接到其他表的参考列.
- 这将避免臭名昭著的书签查找",但缺点是使索引比严格需要的更宽".
- 事实上,索引中较早的列通常已经确定了唯一行,这意味着如果不是为了避免书签查找",额外包含的列将是完全多余的" 好处.
INCLUDE
列基本上可以更有效地获得相同的好处.- It was a common index tuning 'trick' to expand the keys of an index to include columns that weren't needed in the index/filter. (Known as a covering index.)
- These columns were commonly required in output columns or as reference columns for joins to other tables.
- This would avoid the infamous "bookmark lookups", but had the disadvantage of making the index 'wider' than strictly necessary.
- In fact very often the earlier columns in the index would already identify a unique row meaning the extra included columns would be completely redundant if not for the "avoiding bookmark lookups" benefit.
INCLUDE
columns basically allow the same benefit more efficiently.
当使用索引时,索引键用于通过索引页导航到正确的索引数据页.
When an index is used, the index key is used to navigate through the index pages to the correct index data page.
一些注意事项,希望能解决您的一些困惑:
Some things to note that hopefully addresses some of your confusion:
值得注意的是,在 INCLUDE
列被添加为一项功能之前:
It's worth noting that before INCLUDE
columns were added as a feature:
注意 需要指出的一点很重要.如果您养成总是将查询编写为 SELECT * ...
的懒惰习惯,那么您通常从索引中的 INCLUDE
列中获得零收益.通过返回所有列,您基本上可以确保在任何情况下都需要进行书签查找.
NB Something very important to point out. You generally get zero benefit out of
INCLUDE
columns in your indexes if you're in the lazy habit of always writing your queries asSELECT * ...
. By returning all columns you're basically ensuring a bookmark lookup is required in any case.
相关文章