SELECT 查询中的默认行顺序 - SQL Server 2008 与 SQL 2012

我们的团队最近将我们的数据库从 SQL Server 2008 升级到 SQL Server 2012.我们注意到的一个重大变化是 SELECT 语句返回的行的默认顺序,即未指定显式 ORDER BY 子句时.

Our team recently upgraded our databases from SQL Server 2008 to SQL Server 2012. One breaking change we noticed was in the default order of rows returned by the SELECT statement, i.e. when an explicit ORDER BY clause is not specified.

根据 MSDN,SQL Server 2012 不保证除非指定了 ORDER BY 子句,否则返回的行的顺序.

As per MSDN, SQL Server 2012 does not gaurantee the order of the rows returned unless an ORDER BY clause is specified.

我们在 5 个数据库中拥有 2500 多个存储过程,这些存储过程具有不带 ORDER BY 子句的 SELECT 语句,手动添加 ORDER BY 子句以匹配 SQL Server 2008 中的行为将是一项巨大的工作.是否有设置或更快这样做的方式?

We have 2500+ stored procedures across 5 databases that have SELECT statements without an ORDER BY clause and it will be a sizeable effort to add the ORDER BY clause manually to match the behavior in SQL Server 2008. Is there a setting or faster way of doing this?

另一个尚未探索的选项是降级到 SQL Server 2008.这会有多困难?

The other option, which hasn't been explored, is to downgrade to SQL Server 2008. How difficult would this be?

推荐答案

您需要返回并将 ORDER BY 子句添加到您的代码中,因为没有它们,顺序永远无法保证.过去您很幸运",您总是得到相同的订单,但这并不是因为 SQL Server 2008 无论如何都保证了它.这很可能与您的索引或数据在磁盘上的存储方式有关.

You need to go back and add ORDER BY clauses to your code because without them the order is never guaranteed. You were "lucky" in the past that you always got the same order but it wasn't because SQL Server 2008 guaranteed it in anyway. It most likely had to do with your indexes or how the data was being stored on the disk.

如果您在升级时迁移到新主机,仅硬件配置的差异就可能改变您的查询执行方式.更不用说新服务器会重新计算表的统计信息这一事实,而且 SQL Server 2012 查询优化器的处理方式可能与 SQL Server 2008 中的略有不同.

If you moved to a new host when you upgraded the difference in hardware configuration alone could have changed the way your queries execute. Not to mention the fact that the new server would have recalculated statistics on the tables and the SQL Server 2012 query optimizer probably does things a bit differently than the one in SQL Server 2008.

您可以依赖 SQL 中结果集的顺序而不明确说明您希望它的顺序是一种谬论.SQL 结果永远有一个您可以依赖的顺序而不使用ORDER BY 子句.SQL 是围绕集合论构建的.查询结果基本都是sets(或multi-sets).

It is a fallacy that you can rely on the order of a result set in SQL without explicitly stating the order you want it in. SQL results NEVER have an order you can rely on without using an ORDER BY clause. SQL is built around set theory. Query results are basically sets (or multi-sets).

Itzik Ben-Gan 在他的书 Microsoft SQL Server 2012 T-SQL 基础

Itzik Ben-Gan gives a good description of set theory in relation to SQL in his book Microsoft SQL Server 2012 T-SQL Fundamentals

起源于数学家乔治康托的集合论是关系模型所在的数学分支之一基于.康托尔对集合的定义如下:

Set theory, which originated with the mathematician Georg Cantor, is one of the mathematical branches on which the relational model is based. Cantor's definition of a set follows:

我们所说的集合"是指将任何集合 M 变成一个确定的、不同的整体我们感知的对象 m(称为 M 的元素")或我们的想法.- Joseph W. Dauben 和 Georg Cantor(普林斯顿大学出版社,1990)

By a "set" we mean any collection M into a whole of definite, distinct objects m (which are called the "elements" of M) of our perception or of our thought. - Joseph W. Dauben and Georg Cantor (Princeton University Press, 1990)

在对 Itzik 定义中的术语进行彻底解释之后,接着说:

After a thorough explanation of the terms in the definition Itzik then goes on to say:

Cantor 对集合的定义遗漏的内容可能同样重要正如它所包含的那样.请注意,该定义没有提及任何集合元素之间的顺序.集合元素的顺序列出来并不重要.列出集合元素的正式表示法使用大括号:{a, b, c}.因为订单没有相关性,你可以表示与 {b, a, c} 或 {b, c, a} 相同的集合.跳到前面组成标题的一组属性(在 SQL 中称为列)关系(在 SQL 中称为表),一个元素应该是按名称标识 - 不是序数位置.类似地,考虑集合构成关系主体的元组(SQL 称为行);一个元素是由它的键值来标识的——而不是由位置来标识的.许多程序员很难适应这样的想法,尊重对于查询表,行之间没有顺序.换句话说,对表的查询可以以任何顺序返回行,除非您明确要求以特定方式对数据进行排序,也许用于演示目的.

What Cantor's definition of a set leaves out is probably as important as what it includes. Notice that the definition doesn't mention any order among the set elements. The order in which set elements are listed is not imporant. The formal notation for listing set elements uses curly brackets: {a, b, c}. Because order has no relevance you can express the same set as {b, a, c} or {b, c, a}. Jumping ahead to the set of attributes (called columns in SQL) that make up the header of a relation (called a table in SQL), an element is supposed to be identified by name - not ordinal position. Similarly, consider the set of tuples (called rows by SQL) that make up the body of the relation; an element is identified by its key values - not by position. Many programmers have a hard time adapting to the idea that, with respect to querying tables, there is no order among the rows. In other words, a query against a table can return rows in any order unless you explicitly request that the data be sorted in a specific way, perhaps for presentation purposes.

但不管集合的学术定义如何,即使在 SQL Server 中的实现也从未保证结果中的任何顺序.这篇 MSDN 博客文章来自 2005 年的查询优化器成员团队声明你根本不应该依赖中间操作的顺序.

But regardless of the academic definition of a set even the implementation in SQL server has never guaranteed any order in the results. This MSDN blog post from 2005 by a member of the query optimizer team states that you should not rely on the order from intermediate operations at all.

重新排序规则可以也将会违反这个假设(并且这样做当您不方便时,开发人员;).请理解当我们重新排序操作以找到更有效的计划时,我们可以导致中间节点的排序行为改变树.如果你在树中放置了一个假设一个操作特定的中间顺序,它可能会中断.

The reordering rules can and will violate this assumption (and do so when it is inconvenient to you, the developer ;). Please understand that when we reorder operations to find a more efficient plan, we can cause the ordering behavior to change for intermediate nodes in the tree. If you’ve put an operation in the tree that assumes a particular intermediate ordering, it can break.

这篇博文由 Conor Cunningham(架构师,SQL Server 核心引擎)No Seatbelt - Expecting Order without ORDER BY" 是关于 SQL Server 2008.他有一个包含 20k 行的表,其中有一个索引似乎总是以相同的顺序返回行.向查询添加 ORDER BY 甚至不会改变执行计划,因此如果优化器意识到它不需要它,那么添加一个不会使查询更昂贵.但是一旦他向表中添加另外 20k 行,查询计划突然发生变化,现在它使用并行性并且结果不再排序!

This blog post by Conor Cunningham (Architect, SQL Server Core Engine) "No Seatbelt - Expecting Order without ORDER BY" is about SQL Server 2008. He has a table with 20k rows in it with a single index that appears to always return rows in the same order. Adding an ORDER BY to the query doesn't even change the execution plan, so it isn't like adding one in makes the query more expensive if the optimizer realizes it doesn't need it. But once he adds another 20k rows to the table suddenly the query plan changes and now it uses parallelism and the results are no longer ordered!

这里的难点在于没有合理的方法可以用于任何外部用户知道什么时候计划会改变.所有计划的空间都很大并让你的头陷入沉思.SQL Server 的优化器将发生变化计划,即使是简单的查询,如果有足够的参数改变.你可能很幸运,没有改变计划,或者你可以不改变想想这个问题,加个ORDER BY.

The hard part here is that there is no reasonable way for any external user to know when a plan will change . The space of all plans is huge and hurts your head to ponder. SQL Server's optimizer will change plans, even for simple queries, if enough of the parameters change. You may get lucky and not have a plan change, or you can just not think about this problem and add an ORDER BY.

如果您需要更有说服力,请阅读这些帖子:

If you need more convincing just read these posts:

  • 如果没有 ORDER BY,则没有默认的排序顺序.- 亚历山大·库兹涅佐夫
  • 法庭下令!- 托马斯·凯特
  • 结果集的顺序在 SQL 中 - Timothy Wiseman

相关文章