MySQL - 控制组返回哪一行

2021-11-20 00:00:00 group-by mysql sql-order-by

我有一个这样的数据库表:

I have a database table like this:

id    version_id    field1    field2
1     1             texta      text1
1     2             textb      text2
2     1             textc      text3
2     2             textd      text4
2     3             texte      text5

如果您没有解决,它包含一行的多个版本,然后是一些文本数据.

If you didn't work it out, it contains a number of versions of a row, and then some text data.

我想查询它并返回每个 id 编号最高的版本.(所以只有上面的第二行和最后一行).

I want to query it and return the version with the highest number for each id. (so the second and last rows only in the above).

我尝试在按 version_id DESC 排序的同时使用 group by - 但它似乎在分组后排序,所以这不起作用.

I've tried using group by whilst ordering by version_id DESC - but it seems to order after its grouped, so this doesn't work.

有人有什么想法吗?我不敢相信这是做不到的!

Anyone got any ideas? I can't believe it can't be done!

更新:

想出这个,它有效,但使用子查询:

Come up with this, which works, but uses a subquery:

SELECT *
FROM (SELECT * FROM table ORDER BY version_id DESC) t1
GROUP BY t1.id

推荐答案

这叫做选择列的分组最大值.这里有几种不同的 mysql 方法.

It's called selecting the group-wise maximum of a column. Here are several different approaches for mysql.

这是我的做法:

SELECT *
FROM (SELECT id, max(version_id) as version_id FROM table GROUP BY id) t1
INNER JOIN table t2 on t2.id=t1.id and t1.version_id=t2.version_id

虽然mysql会在内存中为子查询创建一个临时表,但这样会比较有效.我假设您已经有了这个表的 (id, version_id) 索引.

This will be relatively efficient, though mysql will create a temporary table in memory for the subquery. I assume you already have an index on (id, version_id) for this table.

对于此类问题,您或多或少必须使用子查询是 SQL 的一个缺陷(半连接是另一个例子).

It's a deficiency in SQL that you more or less have to use a subquery for this type of problem (semi-joins are another example).

子查询在 mysql 中没有得到很好的优化,但不相关的子查询并没有那么糟糕,只要它们不是太大以至于它们会被写入磁盘而不是内存.鉴于在此查询中只有两个整数,子查询可能在此之前很久就有数百万行,但您的第一个查询中的 select * 子查询可能会更快地遇到此问题.

Subqueries are not well optimized in mysql but uncorrelated subqueries aren't so bad as long as they aren't so enormous that they will get written to disk rather than memory. Given that in this query only has two ints the subquery could be millions of rows long before that happened but the select * subquery in your first query could suffer from this problem much sooner.

相关文章