哪些 DBMS 允许对 select 子句中不存在的属性进行排序?

2022-01-14 00:00:00 db2 mysql oracle sql-server sql-order-by

假设我有一个名为 Cars 的表，其中包含 2 列:CarName、BrandName

Let's assume I have a table called Cars with 2 columns: CarName, BrandName

现在我要执行这个查询:

Now I want to execute this query:

select CarName from Cars order by BrandName

如您所见，我想返回一个按列排序的列表，该列表在查询的选择部分中不存在.

As you can see, I'd like to return a list, which is sorted by a column, that is not present in the select part of the query.

sql命令的基本(未优化)执行顺序为:from、where、group by、have, select, order by.

The basic (not optimized) execution sequence of sql commands is: from, where, group by, having, select, order by.

出现的问题是，执行选择命令后，BrandName 不是剩余内容的一部分.

The occuring problem is, that BrandName isn't part of what is left after the select command has been executed.

我在书籍、谷歌和 Stackoverflow 上搜索过这个，但到目前为止，我只发现了几个 SO 评论，例如我知道不允许它的数据库系统，但我不记得哪个一".

I've searched for this in books, on google and on Stackoverflow, but so far I've only found several SO comments like "I know of database system that don't allow it, but I don't remeber which one".

所以我的问题是:
1) 标准 SQL-92 或 SQL99 怎么说关于这个.
2) 哪些数据库允许此查询，哪些不允许?

So my questions are:
1) What do the standards SQL-92 or SQL99 say about this.
2) Which databases allow this query and which don't?

(背景:有几个学生问过这个问题，我想给他们最好的答案)

(Background: A couple of students asked this, and I want to give them the best answer possible)

- 成功测试 Microsoft SQL Server 2012

推荐答案

您的查询是完全合法的语法，您可以按选择中不存在的列排序.

Your query is perfectly legal syntax, you can order by columns that are not present in the select.

使用 MySQL 的工作演示
使用 SQL Server 工作演示
使用 Postgresql 的工作演示
使用 SQLite 的工作演示
使用 Oracle 的工作演示

如果您需要关于合法排序的完整规范，在 SQL 标准 2003 中，它有一长串关于排序应该包含什么和不应该包含什么的语句，(02-Foundation，第 415 页，第 7.13 节 <Query表达式>，子部分 28).这确认您的查询是合法的语法.

If you need the full specs about legal ordering, in the SQL Standard 2003 it has a long list of statements about what the order by should and shouldn't contain, (02-Foundation, page 415, section 7.13 <Query expression>, sub part 28). This confirms that your query is legal syntax.

我认为您的困惑可能是由于选择和/或按 group by 中不存在的列排序，或者在使用 distinct 时按 select 中不存在的列排序.

I think your confusion could be arising from selecting, and/or ordering by columns not present in the group by, or ordering by columns not in the select when using distinct.

两者都有相同的基本问题，据我所知，MySQL 是唯一一个允许其中任何一个的.

Both have the same fundamental problem, and MySQL is the only one to my knowledge that allows either.

问题在于，当使用 group by 或 distinct 时，不需要任何未包含在其中的列，因此它们是否跨行具有多个不同的值并不重要，因为它们从不需要.想象一下这个简单的数据集:

The problem is this, that when using group by or distinct, any columns not contained in either are not needed, so it doesn't matter if they have multiple different values across rows because they are never needed. Imagine this simple data set:

ID | Column1 | Column2 | ----|---------+----------| 1 | A | X | 2 | A | Z | 3 | B | Y |

如果你写:

SELECT DISTINCT Column1 FROM T;

你会得到

Column1 --------- A B

如果您随后添加 ORDER BY Column2，您将使用两个 column2 中的哪一个来按 X 或 Z 排序 A?如何为 column2 选择一个值是不确定的.

If you then add ORDER BY Column2, which of the two column2's would your use to order A by, X or Z? It is not deterministic as to how to choose a value for column2.

这同样适用于选择不在分组依据中的列.为简化起见，只需想象上表的前两行:

The same applies to selecting columns not in the group by. To simplify things just imagine the first two rows of the previous table:

ID | Column1 | Column2 | ----|---------+----------| 1 | A | X | 2 | A | Z |

在 MySQL 中你可以写

In MySQL you can write

SELECT ID, Column1, Column2 FROM T GROUP BY Column1;

这实际上违反了 SQL 标准，但它在 MySQL 中有效，但问题是它是不确定的，结果:

This actually breaks the SQL Standard, but it works in MySQL, however the trouble is it is non-deterministic, the result:

ID | Column1 | Column2 | ----|---------+----------| 1 | A | X |

不比

ID | Column1 | Column2 | ----|---------+----------| 2 | A | Y |

所以你的意思是为 Column1 的每个不同值给我一行，这两个结果集都满足，那么你怎么知道你会得到哪一个?好吧，您不知道，这似乎是一个相当流行的误解，您可以添加和 ORDER BY 子句来影响结果，例如以下查询:

So what you are saying is give me one row for each distinct value of Column1, which both results sets satisfy, so how do you know which one you will get? Well you don't, it seems to be a fairly popular misconception that you can add and ORDER BY clause to influence the results, so for example the following query:

SELECT ID, Column1, Column2 FROM T GROUP BY Column1 ORDER BY ID DESC;

将确保您得到以下结果:

Would ensure that you get the following result:

ID | Column1 | Column2 | ----|---------+----------| 2 | A | Y |

因为 ORDER BY ID DESC，但事实并非如此 (如此处所示).

because of the ORDER BY ID DESC, however this is not true (as demonstrated here).

MySQL 文档状态:

服务器可以从每个组中自由选择任何值，因此除非它们相同，否则选择的值是不确定的.此外，添加 ORDER BY 子句不会影响从每个组中选择值.

The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause.

因此，即使您有这样的订单，直到每组选择一行后才适用，并且这一行是不确定的.

So even though you have an order by this does not apply until after one row per group has been selected, and this one row is non-determistic.

SQL 标准确实允许选择列表中的列不包含在 GROUP BY 或聚合函数中，但是这些列必须在功能上依赖于 GROUP BY 中的列.来自 SQL-2003-Standard (5WD-02-Foundation-2003-09 - 第 346 页) - http:///www.wiscorp.com/sql_2003_standard.zip

The SQL-Standard does allow columns in the select list not contained in the GROUP BY or an aggregate function, however these columns must be functionally dependant on a column in the GROUP BY. From the SQL-2003-Standard (5WD-02-Foundation-2003-09 - page 346) - http://www.wiscorp.com/sql_2003_standard.zip

15) 如果 T 是一个分组表，则令 G 是 T 的分组列的集合.包含在<选择列表>，每个引用 T 列的列引用都应引用某个列 C在功能上依赖于 G 或应包含在<set function specification>的聚合参数中.其聚合查询为 QS.

15) If T is a grouped table, then let G be the set of grouping columns of T. In each <value expression> contained in <select list> , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a <set function specification> whose aggregation query is QS.

例如，示例表中的ID是PRIMARY KEY，所以我们知道它在表中是唯一的，所以下面的查询符合SQL标准，在MySQL中运行，目前在很多DBMS中都会失败(当时编写 Postgresql 是我所知道的最接近正确实施标准的 DBMS - 此处示例):

For example, ID in the sample table is the PRIMARY KEY, so we know it is unique in the table, so the following query conforms to the SQL standard and would run in MySQL and fail in many DBMS currently (At the time of writing Postgresql is the closest DBMS I know of to correctly implementing the standard - Example here):

SELECT ID, Column1, Column2 FROM T GROUP BY ID;

由于每一行的 ID 都是唯一的，因此每个 ID 只能有一个 Column1 值，Column2 一个值对于返回的内容没有歧义每一行.

Since ID is unique for each row, there can only be one value of Column1 for each ID, one value of Column2 there is no ambiguity about what to return for each row.

相关文章