查找与 Min/Max 关联的行,没有内部循环

2021-09-10 00:00:00 sql tsql sql-server

我有一个关于 T-SQL 和 SQL Server 的问题.

I have a question related to T-SQL and SQL Server.

假设我有一个包含 2 列的 Orders 表:

Let's say I have a table Orders with 2 columns:

  • ProductId int
  • CustomerId int
  • 日期日期时间

我想要每个产品的第一个订单的日期,所以我执行这种类型的查询:

I want the date of the first order for every product, so I perform this type of query:

SELECT ProductId, MIN(Date) AS FirstOrder 
FROM Orders
GROUP BY ProductId

我在 ProductId 上有一个索引,包括列 CustomerIdDate 以加快查询速度 (IX_Orders>).查询计划看起来像是对 IX_Orders 的非聚集索引扫描,然后是流聚合(由于索引没有排序).

I have an index on ProductId, including the columns CustomerId and Date to speed up the query (IX_Orders). The query plan looks like a non-clustered index scan on IX_Orders, followed by a stream aggregate (no sort thanks to the index).

现在我的问题是我还想检索与每个产品的第一个订单相关联的 CustomerId(产品 26 于 25 日星期二由客户 12 首次订购).棘手的部分是我不希望在执行计划中有任何内部循环,因为这意味着表中每个 ProductId 的额外读取,这是非常低效的.

Now my problem is that I also want to retrieve the CustomerId associated with the first order for each product (Product 26 was first ordered on Tuesday 25, by customer 12). The tricky part is that I don't want any inner loop in the execution plan, because it would mean an additional read per ProductId in the table, which is highly inefficient.

这应该可以使用相同的非聚集索引扫描,然后是流聚合,但是我似乎找不到可以做到这一点的查询.有什么想法吗?

This should just be possible using the same non-clustered index scan, followed by stream aggregates, however I can't seem to find a query that would do that. Any idea?

谢谢

推荐答案

这将处理具有重复日期的产品:

this will handle products that have duplicate dates:

DECLARE @Orders table (ProductId int
                      ,CustomerId int
                      ,Date datetime
                      )

INSERT INTO @Orders VALUES (1,1,'20090701')
INSERT INTO @Orders VALUES (2,1,'20090703')
INSERT INTO @Orders VALUES (3,1,'20090702')
INSERT INTO @Orders VALUES (1,2,'20090704')
INSERT INTO @Orders VALUES (4,2,'20090701')
INSERT INTO @Orders VALUES (1,3,'20090706')
INSERT INTO @Orders VALUES (2,3,'20090704')
INSERT INTO @Orders VALUES (4,3,'20090702')
INSERT INTO @Orders VALUES (5,5,'20090703')  --duplicate dates for product #5
INSERT INTO @Orders VALUES (5,1,'20090703')  --duplicate dates for product #5
INSERT INTO @Orders VALUES (5,5,'20090703')  --duplicate dates for product #5

;WITH MinOrders AS
(SELECT
     o.ProductId, o.CustomerId, o.Date
         ,row_number() over(partition by o.ProductId order by o.ProductId,o.CustomerId) AS RankValue
     FROM @Orders o
     INNER JOIN (SELECT
                     ProductId
                         ,MIN(Date) MinDate 
                     FROM @Orders 
                     GROUP BY ProductId
                ) dt ON o.ProductId=dt.ProductId AND o.Date=dt.MinDate
 )
SELECT
    m.ProductId, m.CustomerId, m.Date
    FROM MinOrders  m
    WHERE m.RankValue=1
    ORDER BY m.ProductId, m.CustomerId

这将返回相同的结果,只需使用与上述代码相同的声明和插入:

this will return the same results, just use the same declare and inserts as the above code:

;WITH MinOrders AS
(SELECT
     o.ProductId, o.CustomerId, o.Date
         ,row_number() over(partition by o.ProductId order by o.ProductId,o.CustomerId) AS RankValue
     FROM @Orders o
 )
SELECT
    m.ProductId, m.CustomerId, m.Date
    FROM MinOrders  m
    WHERE m.RankValue=1
    ORDER BY m.ProductId, m.CustomerId

您可以尝试每个版本,看看哪个版本运行得更快...

You can try out each version to see which will run faster...

相关文章