MySQL 是否通过允许选择不属于 group by 子句的列来打破标准?

2021-11-20 00:00:00 mysql

我习惯了 Microsoft 技术,包括 SQL Server.今天我遇到了一个问答,其中有以下段落引用自 MySQL 文档:

<块引用>

标准 SQL 会拒绝您的查询,因为您不能 SELECT不属于 GROUP BY 子句的一部分的非聚合字段聚合查询.MySQL 扩展了 GROUP BY 的使用,以便选择列表可以引用未在 GROUP BY 中命名的非聚合列条款.这意味着前面的查询在 MySQL 中是合法的.你可以使用此功能通过避免不必要的操作来获得更好的性能列排序和分组.但是,这主要在以下情况下有用未在 GROUP BY 中命名的每个非聚合列中的所有值都是每组都一样.服务器可以自由选择任何值每个组,因此除非它们相同,否则选择的值是不确定.

MySQL 允许这样做是否违反了标准?如何?允许这样做的结果是什么?

解决方案

标准 SQL 会拒绝您的查询,因为您不能在聚合查询中选择不属于 GROUP BY 子句的非聚合字段

这是正确的,直到 1992 年.

但它显然是错误的,从 2003 年及以后.

来自 SQL-2003 标准,6IWD6-02-Foundation-2011-01.pdf,来自 http://www.wiscorp.com/,第 7.12 段(查询规范),第 398 页:

<块引用>

  1. 如果 T 是一个分组表,那么让 G 是 T 的分组列的集合.在每个 ((值表达式)) 中包含在 ((select list)) 中,每个引用 T 列的列引用都应该引用某个列 C功能上依赖 G 或应包含在 a ((set function specification)) 的聚合参数中其聚合查询为 QS


现在 MYSQL 通过允许不仅功能上依赖的列在分组列上但允许所有列.这会给不了解分组工作原理的用户带来一些问题,并在他们不期望的地方得到不确定的结果.

但是您说 MySQL 添加了一个与 SQL 标准相冲突的功能是对的(尽管您似乎认为这是错误的原因).这并不完全准确,因为他们添加了 SQL 标准功能,但不是以最好的方式(更像是简单的方式),但确实与最新标准相冲突.

为了回答你的问题,这个 MySQL 特性(扩展)的原因是我想符合最新的 SQL 标准(2003+).为什么他们选择以这种方式实施(不完全合规),我们只能推测.

正如@Quassnoi 和@Johan 用示例回答的那样,这主要是性能和可维护性问题.但是不能轻易将 RDBMS 更改为足够聪明(天网除外)来识别功能相关的列,因此 MySQL 开发人员做出了选择:

<块引用>

我们 (MySQL) 为您(MySQL 用户)提供 SQL-2003 标准中的此功能.它提高了某些 GROUP BY 查询的速度,但有一个问题.您必须小心(而不是 SQL 引擎),因此 SELECTHAVING 列表中的列在功能上依赖于 GROUP BY 列.否则,您可能会得到不确定的结果.

<块引用>

如果你想禁用它,你可以将sql_mode设置为ONLY_FULL_GROUP_BY.

一切都在 MySQL 文档:扩展到GROUP BY (5.5) - 虽然不是在上面的措辞中,而是在你的引用中(他们甚至忘记提到它与标准 SQL-2003 的偏差而不是标准的 SQL-92).我认为这种选择在所有软件中都很常见,包括其他 RDBMS.它们是出于性能、向后兼容性和许多其他原因而设计的.Oracle 有著名的 '' is the same as NULL 例如,SQL-Server 也可能有一些.

还有 Peter Bouman 的这篇博文,为 MySQL 开发人员的选择辩护:揭穿 GROUP BY 神话.

在 2011 年,正如 @Mark Byers 在评论中告知我们(在相关问题中)在 DBA.SE),PostgreSQL 9.1 添加了一个新功能(发布日期:2011 年 9 月)专为此目的而设计.比 MySQL 的实现更严格,更接近标准.

后来,在 2015 年 MySQL 宣布在 5.7 版本中,行为得到改进以符合标准并实际识别函数依赖项,(甚至比 Postgres 实现更好).文档:MySQL 处理 GROUP BY (5.7) 和 Peter Bouman 的另一篇博文:MySQL 5.7.5:GROUP BY 尊重函数依赖!

I am used to Microsoft technologies including SQL Server. Today I ran across a Q&A where the following passage from the MySQL documentation was quoted:

Standard SQL would reject your query because you can not SELECT non-aggregate fields that are not part of the GROUP BY clause in an aggregate query. MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.

Is MySQL breaking the standard by allowing this? How? What is the result of allowing this?

解决方案

Standard SQL would reject your query because you can not SELECT non-aggregate fields that are not part of the GROUP BY clause in an aggregate query

This is correct, up to 1992.

But it is plainly wrong, from 2003 and beyond.

From SQL-2003 standard, 6IWD6-02-Foundation-2011-01.pdf, from http://www.wiscorp.com/, paragraph-7.12 (query specification), page 398:

  1. If T is a grouped table, then let G be the set of grouping columns of T. In each ((value expression)) contained in ((select list)) , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a ((set function specification)) whose aggregation query is QS


Now MYSQL, has implemented this feature by allowing not only columns that are functionally dependent on the grouping columns but allowing all columns. This is causing some problems with users that do not understand how grouping works and get indeterminate results where they don't expect.

But you are right to say that MySQL has added a feature that conflicts with SQL-standards (although you seem to think that for the wrong reason). It's not entirely accurate as they have added a SQL-standard feature but not in the best way (more like the easy way) but it does conflict with the latest standards.

To answer your question, the reason for this MySQL feature (extension) is I suppose to be accordance with latest SQL-standards (2003+). Why they chose to implement it this way (not fully compliant), we can only speculate.

As @Quassnoi and @Johan answered with examples, it's mainly a performance and maintainability issue. But one can't easily change the RDBMS to be clever enough (Skynet excluded) to recognize functionally dependent columns, so MySQL developers made a choice:

We (MySQL) give you (MySQL users) this feature which is in SQL-2003 standards. It improves speed in certain GROUP BY queries but there's a catch. You have to be careful (and not the SQL engine) so columns in the SELECT and HAVING lists are functionally dependent on the GROUP BY columns. If not, you may get indeterminate results.

If you want to disable it, you can set sql_mode to ONLY_FULL_GROUP_BY.

It's all in the MySQL docs: Extensions to GROUP BY (5.5) - although not in the above wording but as in your quote (they even forgot to mention that it's a deviation from standard SQL-2003 while not standard SQL-92). This kind of choices is common I think in all software, other RDBMS included. They are made for performance, backward compatibility and a lot of other reasons. Oracle has the famous '' is the same as NULL for example and SQL-Server has probably some, too.

There is also this blog post by Peter Bouman, where MySQL developers' choice is defended: Debunking GROUP BY myths.

In 2011, as @Mark Byers informed us in a comment (in a related question at DBA.SE), PostgreSQL 9.1 added a new feature (release date: September 2011) designed for this purpose. It is more restrictive than MySQL's implementation and closer to the standard.

Later, in 2015 MySQL announced that in 5.7 version, the behaviour is improved to conform with the standard and actually recognize functional dependencies, (even better than the Postgres implementation). The documentation: MySQL Handling of GROUP BY (5.7) and another blog post by Peter Bouman: MySQL 5.7.5: GROUP BY respects functional dependencies!

相关文章