消除 PostgreSQL SELECT 语句中的重复行

2022-01-10 00:00:00 postgresql sql duplicates select mysql

这是我的查询:

SELECT autor.entwickler,anwendung.name
  FROM autor 
  left join anwendung
    on anwendung.name = autor.anwendung;

 entwickler |    name     
------------+-------------
 Benutzer 1 | Anwendung 1
 Benutzer 2 | Anwendung 1
 Benutzer 2 | Anwendung 2
 Benutzer 1 | Anwendung 3
 Benutzer 1 | Anwendung 4
 Benutzer 2 | Anwendung 4
(6 rows)

我想为字段 name 中的每个不同值保留一行，并像这样丢弃其他值:

I want to keep one row for each distinct value in the field name, and discard the others like this:

entwickler | name ------------+------------- Benutzer 1 | Anwendung 1 Benutzer 2 | Anwendung 2 Benutzer 1 | Anwendung 3 Benutzer 1 | Anwendung 4

在 MySQL 中我会这样做:

In MySQL I would just do:

SELECT autor.entwickler,anwendung.name FROM autor left join anwendung on anwendung.name = autor.anwendung GROUP BY anwendung.name;

但是 PostgreSQL 给了我这个错误:

But PostgreSQL gives me this error:

错误:列auto.entwickler"必须出现在 GROUP BY 子句中或用于聚合函数第 1 行:SELECT autor.entwicklerFROM autotor left join anwendung on an ...

ERROR: column "autor.entwickler" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT autor.entwickler FROM autor left join anwendung on an ...

我完全理解错误并假设 mysql 实现比 postgres 实现更不符合 SQL.但是我怎样才能得到想要的结果呢?

I totally understand the error and assume that the mysql implementation is less SQL conform than the postgres implementation. But how can I get the desired result?

推荐答案

PostgreSQL 目前不允许模棱两可的 GROUP BY 语句，其中结果取决于扫描表的顺序，使用的计划等.这就是标准所说的它应该如何工作 AFAIK，但一些数据库(如 5.7 之前的 MySQL 版本)允许更松散的查询，这些查询只选择出现在 SELECT 列表中的元素遇到的第一个值，而不是在 GROUP BY 中.

PostgreSQL doesn't currently allow ambiguous GROUP BY statements where the results are dependent on the order the table is scanned, the plan used, etc. That's how the standard says it should work AFAIK, but some databases (like MySQL versions prior to 5.7) permit looser queries that just pick the first value encountered for elements appearing in the SELECT list but not in GROUP BY.

在 PostgreSQL 中，您应该使用 DISTINCT ON 用于此类查询.

In PostgreSQL, you should use DISTINCT ON for this kind of query.

你想写这样的东西:

SELECT DISTINCT ON (anwendung.name) anwendung.name, autor.entwickler FROM author left join anwendung on anwendung.name = autor.anwendung;

(根据后续评论更正语法)

(Syntax corrected based on follow-up comment)

这有点像 MySQL 5.7 的 ANY_VALUE(...) 伪函数，用于 group by，但相反 - 它表示distinct on 子句必须是唯一的，并且对于指定的列not 可以接受任何值.

This is a bit like MySQL 5.7's ANY_VALUE(...) pseudo-function for group by, but in reverse - it says that the values in the distinct on clause must be unique, and any value is acceptable for the columns not specified.

除非有 ORDER BY，否则无法保证选择哪些值.您通常应该有一个 ORDER BY 以便于预测.

Unless there's an ORDER BY, there is no gurantee as to which values are selected. You should usually have an ORDER BY for predictability.

还注意到使用 min() 或 max() 之类的聚合是可行的.虽然这是真的 - 并且将导致可靠和可预测的结果，与使用 DISTINCT ON 或模棱两可的 GROUP BY 不同 - 由于需要额外的排序，它具有性能成本或聚合，它只适用于序数数据类型.

It's also been noted that using an aggregate like min() or max() would work. While this is true - and will lead to reliable and predictable results, unlike using DISTINCT ON or an ambigious GROUP BY - it has a performance cost due to the need for extra sorting or aggregation, and it only works for ordinal data types.

相关文章