SELECT 子句中不存在聚合函数时的 GROUP BY 行为

2021-11-20 00:00:00 sql mysql

我有一个表 emp 具有以下结构和数据:

姓名部门工资----- ----- -----杰克 2吉尔 1汤姆 b 2弗雷德 b 1

当我执行以下 SQL 时:

SELECT * FROM emp GROUP BY dept

我得到以下结果:

姓名部门工资----- ----- -----吉尔 1弗雷德 b 1

服务器根据什么决定返回吉尔和弗雷德并排除杰克和汤姆?

我正在 MySQL 中运行此查询.

注 1:我知道查询本身没有意义.我正在尝试使用GROUP BY"方案调试问题.我正在尝试了解为此目的的默认行为.

注意 2:我习惯于编写与 GROUP BY 子句相同的 SELECT 子句(减去聚合字段).当我遇到上述行为时,我开始想知道我是否可以将其用于以下场景:从 emp 表中选择部门中工资最低/最高的行.例如:像这样的 SQL 语句适用于 MySQL:

SELECT A.*, MIN(A.salary) AS min_salary FROM emp AS A GROUP BY A.dept

我没有找到任何描述此类 SQL 为何起作用的材料,更重要的是我是否可以始终依赖此类行为.如果这是一个可靠的行为,那么我可以避免这样的查询:

SELECT A.* FROM emp AS A WHERE A.salary = (SELECT MAX(B.salary) FROM emp B WHERE B.dept = A.dept)

解决方案

阅读 MySQL 文档 关于这一点.

简而言之,出于性能目的,MySQL 允许从 GROUP BY 中省略一些列,但是这仅适用于如果省略的列都具有相同的值(在一个分组),否则,查询返回的值确实是不确定的,正如本文中其他人正确猜测的那样.可以肯定的是,添加 ORDER BY 子句不会重新引入任何形式的确定性行为.

虽然不是问题的核心,但这个例子展示了如何使用 * 而不是显式枚举所需的列通常是一个坏主意.

摘自 MySQL 5.0 文档:

<前>使用此功能时,每组中的所有行都应具有相同的值对于从 GROUP BY 部分省略的列.服务器是免费的从组中返回任何值,因此结果是不确定的,除非所有值都相同.

I have a table emp with following structure and data:

name   dept    salary
-----  -----   -----
Jack   a       2
Jill   a       1
Tom    b       2
Fred   b       1

When I execute the following SQL:

SELECT * FROM emp GROUP BY dept

I get the following result:

name   dept    salary
-----  -----   -----
Jill   a       1
Fred   b       1

On what basis did the server decide return Jill and Fred and exclude Jack and Tom?

I am running this query in MySQL.

Note 1: I know the query doesn't make sense on its own. I am trying to debug a problem with a 'GROUP BY' scenario. I am trying to understand the default behavior for this purpose.

Note 2: I am used to writing the SELECT clause same as the GROUP BY clause (minus the aggregate fields). When I came across the behavior described above, I started wondering if I can rely on this for scenarios such as: select the rows from emp table where the salary is the lowest/highest in the dept. E.g.: The SQL statements like this works on MySQL:

SELECT A.*, MIN(A.salary) AS min_salary FROM emp AS A GROUP BY A.dept

I didn't find any material describing why such SQL works, more importantly if I can rely on such behavior consistently. If this is a reliable behavior then I can avoid queries like:

SELECT A.* FROM emp AS A WHERE A.salary = ( 
            SELECT MAX(B.salary) FROM emp B WHERE B.dept = A.dept)

解决方案

Read MySQL documentation on this particular point.

In a nutshell, MySQL allows omitting some columns from the GROUP BY, for performance purposes, however this works only if the omitted columns all have the same value (within a grouping), otherwise, the value returned by the query are indeed indeterminate, as properly guessed by others in this post. To be sure adding an ORDER BY clause would not re-introduce any form of deterministic behavior.

Although not at the core of the issue, this example shows how using * rather than an explicit enumeration of desired columns is often a bad idea.

Excerpt from MySQL 5.0 documentation:

When using this feature, all rows in each group should have the same values
for the columns that are omitted from the GROUP BY part. The server is free
to return any value from the group, so the results are indeterminate unless
all values are the same. 

相关文章