为什么&什么时候应该使用 SPARSE COLUMN?(SQL SERVER 2008)

在浏览了一些关于 SQL Server 2008 的新特性稀疏列"的教程后,我发现如果列值为 0 或 NULL,它不会占用任何空间,但是当有一个值时,它需要 4 倍常规(非稀疏)列容纳的空间.

After going thru some tutorials on SQL Server 2008's new feature "SPARSE COLUMN", I have found that it doesn't take any space if the column value is 0 or NULL but when there is a value, it takes 4 times the space a regular(non sparse) column holds.

如果我的理解是正确的,那为什么我在数据库设计的时候会这样做呢?如果我使用它,那么我会是什么情况?

If my understanding is correct, then why I will go for that at the time of database design? And if I use that, then at what situation will I be?

同样出于好奇,当一个列被定义为稀疏列时,如何没有保留空间(我的意思是说,它的内部实现是什么?)

Also out of curiosity, how does no space get reserved when a column is defined as sparse column (I mean to say, what is the internal implementation for that?)

推荐答案

稀疏列不使用 4 倍的空间量 来存储值,它使用(固定的)每个非空值 4 个额外字节.(正如您已经说过的,NULL 占用 0 空间.)

A sparse column doesn't use 4x the amount of space to store a value, it uses a (fixed) 4 extra bytes per non-null value. (As you've already stated, a NULL takes 0 space.)

  • 因此,存储在 bit 列中的非空值将是 1 位 + 4 字节 = 4.125 字节.但如果其中 99% 为 NULL,则仍然是净节省.

  • So a non-null value stored in a bit column would be 1 bit + 4 bytes = 4.125 bytes. But if 99% of these are NULL, it is still a net savings.

存储在 GUID (UniqueIdentifier) 列中的非空值是 16 字节 + 4 字节 = 20 字节.因此,如果其中只有 50% 为 NULL,那仍然是净节省.

A non-null value stored in a GUID (UniqueIdentifier) column is 16 bytes + 4 bytes = 20 bytes. So if only 50% of these are NULL, that's still a net savings.

因此,预期节省"在很大程度上取决于我们所讨论的列的种类,以及您对空值与非空值比率的估计.可变宽度列 (varchars) 可能更难以准确预测.

So the "expected savings" depends strongly on what kind of column we're talking about, and your estimate of what ratio will be null vs non-null. Variable width columns (varchars) are probably a little more difficult to predict accurately.

这个在线图书页面有一个表格,显示什么不同数据类型的百分比需要为空才能获得收益.

This Books Online Page has a table showing what percentage of different data types would need to be null for you to end up with a benefit.

那么什么时候应该使用稀疏列?当您期望很大比例的行具有 NULL 值时.想到的一些例子:

So when should you use a Sparse Column? When you expect a significant percentage of the rows to have a NULL value. Some examples that come to mind:

  • 订单表中的订单退货日期"列.您希望只有很小一部分销售额会导致退货.
  • 地址表中的第 4 个地址"行.大多数邮寄地址,即使您需要部门名称和转交",也可能不需要 4 行.
  • 客户表中的后缀"列.相当低百分比的人拥有Jr".或在他们的名字后面加上III"或Esquire".
  • A "Order Return Date" column in an order table. You would hope that a very small percent of sales would result in returned products.
  • A "4th Address" line in an Address table. Most mailing addresses, even if you need a Department name and a "Care Of" probably don't need 4 separate lines.
  • A "Suffix" column in a customer table. A fairly low percent of people have a "Jr." or "III" or "Esquire" after their name.

相关文章