将 SELECT DISTINCT ON 查询从 Postgresql 转换为 MySQL
我一直在使用 PostgreSQL,现在迁移到 MySQL.
I've been using PostgreSQL and now migrating to MySQL.
在我的查询中,我使用了 PostgreSQL 的 SELECT DISTINCT ON (col1, col2, col3)
,我想知道是否有任何对应的语句在 MySQL 中.
In my queries, I'm using PostgreSQL's SELECT DISTINCT ON (col1, col2, col3)
, I was wondering if there is any counterpart of this statement in MySQL.
推荐答案
没有完全等效的方法可以将使用 SELECT DISTINCT ON 的 Postgresql 查询转换为 MySQL.
There's not an exact equivalent to convert a Postgresql query that makes use of SELECT DISTINCT ON to MySQL.
Postgresql SELECT DISTINCT ON
在 Postgresql 中,以下查询将消除表达式 (col1, col2, col3)
匹配的所有行,并且它只会保留每组匹配的第一个 col4, col5 行"行:
In Postgresql, the following query will eliminate all rows where the expressions (col1, col2, col3)
match, and it will only keep the "first col4, col5 row" for each set of matched rows:
SELECT DISTINCT ON (col1, col2, col3) col4, col5
FROM tablename
所以如果你的桌子是这样的:
So if your table is like this:
col1 | col2 | col3 | col4 | col5
--------------------------------
1 | 2 | 3 | 777 | 888
1 | 2 | 3 | 888 | 999
3 | 3 | 3 | 555 | 555
我们的查询将只保留一行(1,2,3)和一行(3,3,3).结果行将是:
our query will keep just one row for (1,2,3) and one row for (3,3,3). The resulting rows will then be:
col4 | col5
-----------
777 | 888
555 | 555
请注意,每个集合的第一行"是不可预测的,除非我们指定 ORDER BY,否则我们的第一行也可能是 (888, 999):
please notice that the "first row" of each set is unpredictable, our fist row might be (888, 999) as well unless we specify an ORDER BY:
SELECT DISTINCT ON (col1, col2, col3) col4, col5
FROM tablename
ORDER BY col1, col2, col3, col4
(表达式上的 DISTINCT 必须匹配最左边的 ORDER BY 表达式,但 ORDER BY 可以包含其他表达式).
(the DISTINCT on expressions must match the leftmost ORDER BY expressions, but the ORDER BY can contain additional expressions).
MySQL 对 GROUP BY 的扩展
MySQL 扩展了 GROUP BY 的使用,以便我们可以选择未在 GROUP BY 子句中命名的非聚合列.每当我们选择非聚合列时,服务器都可以从该列的每个组中自由选择任何值,因此结果值将是不确定的.
MySQL extends the use of GROUP BY so that we can select nonaggregated columns not named in the GROUP BY clause. Whenever we select nonaggregated columns the server is free to choose any value from each group from that column, so the resulting values will be indetermined.
所以这个 Postgresql 查询:
So this Postgresql query:
SELECT DISTINCT ON (col1, col2, col3) col4, col5
FROM tablename
可以认为等同于这个 MySQL 查询:
can be considered equivalent to this MySQL query:
SELECT col4, col5
FROM tablename
GROUP BY col1, col2, col3
Postgresql 和 MySQL 都会为每个 (col1, col2, col3) 返回第一行",在这两种情况下,返回的行都是不可预测的,因为我们没有指定和 order by 子句.
both Postgresql and MySQL will return the "First row" for each (col1, col2, col3), and in both cases the row returned is unpredictable because we didn't specify and order by clause.
很多人会很想用 ORDER BY 转换这个 Postgresql 查询:
A lot of people would be very tempted to convert this Postgresql query with an ORDER BY:
SELECT DISTINCT ON (col1, col2, col3) col4, col5
FROM tablename
ORDER BY col1, col2, col3, col4
有了这个:
SELECT col4, col5
FROM (
SELECT col1, col2, col3, col4, col5
FROM tablename
ORDER BY col1, col2, col3, col4
) s
GROUP BY col1, col2, col3
这里的想法是将 ORDER BY 应用于子查询,这样当 MySQL 按 col1、col2、col3 分组时,它将保留 col4 和 col5 的第一个遇到的值.想法是好的,但它是错误的! MySQL 可以自由选择 col4 和 col5 的任何值,我们不知道遇到的第一个值是什么,这取决于优化器.所以我会更正它:
the idea here is to apply an ORDER BY to a subquery so that when MySQL groups by col1, col2, col3 it will keep the first encountered value for col4 and col5. The idea is good, but it's wrong! MySQL is free to choose any value for col4 and col5, and we don't know which are the first values encountered, it depends on the optimizer. So I would correct it to this:
SELECT t1.col4, t1.col5
FROM tablename t1 INNER JOIN (SELECT col1, col2, col3, MIN(col4) as m_col4
FROM tablename
GROUP BY col1, col2, col3) s
ON t1.col1=s.col1
AND t1.col2=s.col2
AND t1.col3=s.col3
AND t1.col4=s.m_col4
GROUP BY
t1.col1, t1.col2, t1.col3, t1.col4
但这开始变得更加复杂.
but this is starting to get more complicated.
结论
作为一般规则,没有一种确切的方法可以将 Postgresql 查询转换为 MySQL 查询,但是有很多变通方法,生成的查询可能和原始查询一样简单,也可能变得非常复杂,但是这取决于查询本身.
As a general rule, there's not an exact way to convert a Postgresql query to a MySQL query, but there are a lot of workarounds, the resulting query might be as simple as the original one or it might become very complicated, but it depends on the query itself.
相关文章