如何选择每个类别的最新四个项目?

2021-11-20 00:00:00 sql mysql greatest-n-per-group

我有一个项目数据库.每个项目都使用类别表中的类别 ID 进行分类.我正在尝试创建一个列出每个类别的页面,并在每个类别下方显示该类别中的 4 个最新项目.

I have a database of items. Each item is categorized with a category ID from a category table. I am trying to create a page that lists every category, and underneath each category I want to show the 4 newest items in that category.

例如:

宠物用品

img1
img2
img3
img4

宠物食品

img1
img2
img3
img4

我知道我可以通过像这样查询每个类别的数据库来轻松解决这个问题:

I know that I could easily solve this problem by querying the database for each category like so:

SELECT id FROM category

然后迭代该数据并查询每个类别的数据库以获取最新项目:

Then iterating over that data and querying the database for each category to grab the newest items:

SELECT image FROM item where category_id = :category_id 
ORDER BY date_listed DESC LIMIT 4

我想弄清楚的是,我是否可以只使用 1 个查询并获取所有这些数据.我有 33 个类别,所以我认为这可能有助于减少对数据库的调用次数.

What I'm trying to figure out is if I can just use 1 query and grab all of that data. I have 33 categories so I thought perhaps it would help reduce the number of calls to the database.

有谁知道这可能吗?或者如果 33 个电话不是什么大问题,我应该用简单的方法来做.

Anyone know if this is possible? Or if 33 calls isn't that big a deal and I should just do it the easy way.

推荐答案

这是最大的 n-per-group 问题,也是一个非常常见的 SQL 问题.

This is the greatest-n-per-group problem, and it's a very common SQL question.

这是我使用外连接解决它的方法:

Here's how I solve it with outer joins:

SELECT i1.*
FROM item i1
LEFT OUTER JOIN item i2
  ON (i1.category_id = i2.category_id AND i1.item_id < i2.item_id)
GROUP BY i1.item_id
HAVING COUNT(*) < 4
ORDER BY category_id, date_listed;

我假设 item 表的主键是 item_id,并且它是一个单调递增的伪键.也就是说,item_id 中较大的值对应于 item 中较新的行.

I'm assuming the primary key of the item table is item_id, and that it's a monotonically increasing pseudokey. That is, a greater value in item_id corresponds to a newer row in item.

这是它的工作原理:对于每个项目,都有一些其他较新的项目.例如,有比第四个最新项目更新的三个项目.零个项目比最新项目新.因此,我们希望将每个项目 (i1) 与较新且与 i1 具有相同类别的项目集 (i2) 进行比较.如果这些新项目的数量少于四个,i1 就是我们包含的项目之一.否则,请不要包含它.

Here's how it works: for each item, there are some number of other items that are newer. For example, there are three items newer than the fourth newest item. There are zero items newer than the very newest item. So we want to compare each item (i1) to the set of items (i2) that are newer and have the same category as i1. If the number of those newer items is less than four, i1 is one of those we include. Otherwise, don't include it.

此解决方案的美妙之处在于,无论您拥有多少个类别,它都能正常工作,并且在您更改类别时继续工作.即使某些类别中的项目数少于四个,它也能正常工作.

The beauty of this solution is that it works no matter how many categories you have, and continues working if you change the categories. It also works even if the number of items in some categories is fewer than four.

另一种可行但依赖于 MySQL 用户变量功能的解决方案:

Another solution that works but relies on the MySQL user-variables feature:

SELECT *
FROM (
    SELECT i.*, @r := IF(@g = category_id, @r+1, 1) AS rownum, @g := category_id
    FROM (@g:=null, @r:=0) AS _init
    CROSS JOIN item i
    ORDER BY i.category_id, i.date_listed
) AS t
WHERE t.rownum <= 3;

<小时>

MySQL 8.0.3 引入了对 SQL 标准窗口函数的支持.现在我们可以像其他 RDBMS 一样解决这类问题:


MySQL 8.0.3 introduced support for SQL standard window functions. Now we can solve this sort of problem the way other RDBMS do:

WITH numbered_item AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY item_id) AS rownum
  FROM item
)
SELECT * FROM numbered_item WHERE rownum <= 4;

相关文章