SQL Server GUID 排序算法.为什么?
UniqueIdentifier 的问题
我们有一个现有的数据库,它广泛地(不幸的是!)使用唯一标识符作为主键和某些表的一些可为空的列.我们遇到过这样一种情况,在这些表上运行的一些报告根据这些 uniqueidentifier 进行排序,因为表中没有其他列可以进行有意义的排序(这不是讽刺!).目的是排序,以便按插入顺序显示项目,但未使用 NewSequentialId()
插入它们 - 因此浪费时间.
We have an existing database which uses uniqueidentifiers extensively (unfortunately!) both as primary keys and some nullable columns of some tables. We came across a situation where some reports that run on these tables sort on these uniqueidentifiers because there is no other column in the table that would give a meaningful sort (isn't that ironic!). The intent was to sort so that it shows the items in the order they were inserted but they were not inserted using NewSequentialId()
- hence a waste of time.
关于排序算法的事实
无论如何,考虑到 SQL Server 根据从结束的第 5 个字节组(6 个字节)开始并移向第 1 个字节组(4 个字节)的字节组对 uniqueidentifiers 进行排序,从第 3 个字节组(2 个字节)开始反转顺序从右到左,从左到右,
Anyway, considering SQL Server sorts uniqueidentifiers based on byte groups starting from the ending 5th byte group (6 bytes) and moving towards the 1st byte group (4 bytes) reversing the order on the 3rd byte-group (2 bytes) from right-left to left-right,
我的问题
我很想知道在现实生活中是否存在这种类型有帮助的情况.
I was curious to know if there is any real life situation that this kind of sort helps at all.
SQL Server 如何在内部存储唯一标识符,这可能会提供有关为什么它有这种奇怪的排序算法?
How does SQL Server store the uniqueidentifier internally which might provide insight on why it has this whacky sort algorithm?
参考:
Alberto Ferrari 发现了 SQL Server GUID 排序
示例
当您在具有以下数据的 uniqueidentifier 列上使用 Order By 时,Uniqueidentifier 的排序如下所示.
Uniqueidentifiers are sorted as shown below when you use a Order By on a uniqueidentifier column having the below data.
请注意,以下数据是升序排列的,最高排序偏好是从第 5 个字节组到第 1 个字节组(向后).
Please note that the below data is sorted ascendingly and highest sort preference is from the 5th byte group towards the 1st byte group (backwards).
-- 1st byte group of 4 bytes sorted in the reverse (left-to-right) order below --
01000000-0000-0000-0000-000000000000
10000000-0000-0000-0000-000000000000
00010000-0000-0000-0000-000000000000
00100000-0000-0000-0000-000000000000
00000100-0000-0000-0000-000000000000
00001000-0000-0000-0000-000000000000
00000001-0000-0000-0000-000000000000
00000010-0000-0000-0000-000000000000
-- 2nd byte group of 2 bytes sorted in the reverse (left-to-right) order below --
00000000-0100-0000-0000-000000000000
00000000-1000-0000-0000-000000000000
00000000-0001-0000-0000-000000000000
00000000-0010-0000-0000-000000000000
-- 3rd byte group of 2 bytes sorted in the reverse (left-to-right) order below --
00000000-0000-0100-0000-000000000000
00000000-0000-1000-0000-000000000000
00000000-0000-0001-0000-000000000000
00000000-0000-0010-0000-000000000000
-- 4th byte group of 2 bytes sorted in the straight (right-to-left) order below --
00000000-0000-0000-0001-000000000000
00000000-0000-0000-0010-000000000000
00000000-0000-0000-0100-000000000000
00000000-0000-0000-1000-000000000000
-- 5th byte group of 6 bytes sorted in the straight (right-to-left) order below --
00000000-0000-0000-0000-000000000001
00000000-0000-0000-0000-000000000010
00000000-0000-0000-0000-000000000100
00000000-0000-0000-0000-000000001000
00000000-0000-0000-0000-000000010000
00000000-0000-0000-0000-000000100000
00000000-0000-0000-0000-000001000000
00000000-0000-0000-0000-000010000000
00000000-0000-0000-0000-000100000000
00000000-0000-0000-0000-001000000000
00000000-0000-0000-0000-010000000000
00000000-0000-0000-0000-100000000000
代码:
Alberto 的代码扩展为表示排序是基于字节而不是单个位.
Alberto's code extended to denote that sorting is on the bytes and not on the individual bits.
With Test_UIDs As (-- 0 1 2 3 4 5 6 7 8 9 A B C D E F
Select ID = 1, UID = cast ('00000000-0000-0000-0000-100000000000' as uniqueidentifier)
Union Select ID = 2, UID = cast ('00000000-0000-0000-0000-010000000000' as uniqueidentifier)
Union Select ID = 3, UID = cast ('00000000-0000-0000-0000-001000000000' as uniqueidentifier)
Union Select ID = 4, UID = cast ('00000000-0000-0000-0000-000100000000' as uniqueidentifier)
Union Select ID = 5, UID = cast ('00000000-0000-0000-0000-000010000000' as uniqueidentifier)
Union Select ID = 6, UID = cast ('00000000-0000-0000-0000-000001000000' as uniqueidentifier)
Union Select ID = 7, UID = cast ('00000000-0000-0000-0000-000000100000' as uniqueidentifier)
Union Select ID = 8, UID = cast ('00000000-0000-0000-0000-000000010000' as uniqueidentifier)
Union Select ID = 9, UID = cast ('00000000-0000-0000-0000-000000001000' as uniqueidentifier)
Union Select ID = 10, UID = cast ('00000000-0000-0000-0000-000000000100' as uniqueidentifier)
Union Select ID = 11, UID = cast ('00000000-0000-0000-0000-000000000010' as uniqueidentifier)
Union Select ID = 12, UID = cast ('00000000-0000-0000-0000-000000000001' as uniqueidentifier)
Union Select ID = 13, UID = cast ('00000000-0000-0000-0001-000000000000' as uniqueidentifier)
Union Select ID = 14, UID = cast ('00000000-0000-0000-0010-000000000000' as uniqueidentifier)
Union Select ID = 15, UID = cast ('00000000-0000-0000-0100-000000000000' as uniqueidentifier)
Union Select ID = 16, UID = cast ('00000000-0000-0000-1000-000000000000' as uniqueidentifier)
Union Select ID = 17, UID = cast ('00000000-0000-0001-0000-000000000000' as uniqueidentifier)
Union Select ID = 18, UID = cast ('00000000-0000-0010-0000-000000000000' as uniqueidentifier)
Union Select ID = 19, UID = cast ('00000000-0000-0100-0000-000000000000' as uniqueidentifier)
Union Select ID = 20, UID = cast ('00000000-0000-1000-0000-000000000000' as uniqueidentifier)
Union Select ID = 21, UID = cast ('00000000-0001-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 22, UID = cast ('00000000-0010-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 23, UID = cast ('00000000-0100-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 24, UID = cast ('00000000-1000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 25, UID = cast ('00000001-0000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 26, UID = cast ('00000010-0000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 27, UID = cast ('00000100-0000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 28, UID = cast ('00001000-0000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 29, UID = cast ('00010000-0000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 30, UID = cast ('00100000-0000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 31, UID = cast ('01000000-0000-0000-0000-000000000000' as uniqueidentifier)
Union Select ID = 32, UID = cast ('10000000-0000-0000-0000-000000000000' as uniqueidentifier)
)
Select * From Test_UIDs Order By UID, ID
推荐答案
SQL Server 人员在此处记录了该算法:如何在 SQL Server 中比较 GUID2005年?我在这里引用(因为这是一篇可能在几年内永远消失的旧文章)
The algorithm is documented by the SQL Server guys here: How are GUIDs compared in SQL Server 2005? I Quote here here (since it's an old article that may be gone forever in a few years)
一般来说,相等比较对唯一标识符值.但是,如果您发现自己需要一般排序,那么您可能正在查看错误的数据类型,应该而是考虑各种整数类型.
In general, equality comparisons make a lot of sense with uniqueidentifier values. However, if you find yourself needing general ordering, then you might be looking at the wrong data type and should consider various integer types instead.
如果经过仔细考虑,您决定使用唯一标识符进行订购列,您可能会对返回的内容感到惊讶.
If, after careful thought, you decide to order on a uniqueidentifier column, you might be surprised by what you get back.
给定这两个 uniqueidentifier 值:
Given these two uniqueidentifier values:
@g1='55666BEE-B3A0-4BF5-81A7-86FF976E763F'@g2 ='8DD5BCA5-6ABE-4F73-B4B7-393AE6BBB849'
@g1= '55666BEE-B3A0-4BF5-81A7-86FF976E763F' @g2 = '8DD5BCA5-6ABE-4F73-B4B7-393AE6BBB849'
很多人认为@g1小于@g2,因为'55666BEE'是肯定小于'8DD5BCA5'.但是,这不是 SQL Server 的方式2005 年比较了 uniqueidentifier 值.
Many people think that @g1 is less than @g2, since '55666BEE' is certainly smaller than '8DD5BCA5'. However, this is not how SQL Server 2005 compares uniqueidentifier values.
通过查看字节组"进行比较.从右到左,和在字节组"内从左到右.字节组是被分隔的通过-"字符.从技术上讲,我们查看字节 {10 到 15}首先,然后是 {8-9},然后是 {6-7},然后是 {4-5},最后是 {0 到 3}.
The comparison is made by looking at byte "groups" right-to-left, and left-to-right within a byte "group". A byte group is what is delimited by the '-' character. More technically, we look at bytes {10 to 15} first, then {8-9}, then {6-7}, then {4-5}, and lastly {0 to 3}.
在这个特定的例子中,我们将从比较86FF976E763F"开始带有393AE6BBB849".我们立即看到@g2 确实更大比@g1.
In this specific example, we would start by comparing '86FF976E763F' with '393AE6BBB849'. Immediately we see that @g2 is indeed greater than @g1.
请注意,在 .NET 语言中,Guid 值具有不同的默认排序顺序比在 SQL Server 中.如果您发现需要订购数组或使用 SQL Server 比较语义的 Guid 列表,您可以使用SqlGuid 的数组或列表,它在一个符合 SQL Server 语义的方式.
Note that in .NET languages, Guid values have a different default sort order than in SQL Server. If you find the need to order an array or list of Guid using SQL Server comparison semantics, you can use an array or list of SqlGuid instead, which implements IComparable in a way which is consistent with SQL Server semantics.
此外,排序遵循字节组字节序(参见此处:全球唯一标识符).组 10-15 和 8-9 存储为 big endian(对应于维基百科文章中的 Data4),因此将它们作为 big endian 进行比较.其他组使用小端比较.
Plus, the sort follows byte groups endianness (see here: Globally unique identifier). The groups 10-15 and 8-9 are stored as big endian (corresponding to the Data4 in the wikipedia article), so they are compared as big endian. Other groups are compared using little endian.
相关文章