SQL Server:跨组(而非组内)的超前/滞后分析功能

2022-01-03 00:00:00 sliding-window sql-server sql-server-2012 lag lead

抱歉，帖子太长了，但我已经提供了副本&下面粘贴示例数据和可能的解决方法.问题的相关部分在帖子的上半部分(横线上方).

Sorry for the long post, but I have provided copy & paste sample data and a possible solution approach below. The relevant part of the question is in the upper part of the post (above the horizontal rule).

我有下表

Dt customer_id buy_time money_spent ------------------------------------------------- 2000-01-04 100 11:00:00.00 2 2000-01-05 100 16:00:00.00 1 2000-01-10 100 13:00:00.00 4 2000-01-10 100 14:00:00.00 3 2000-01-04 200 09:00:00.00 10 2000-01-06 200 10:00:00.00 11 2000-01-06 200 11:00:00.00 5 2000-01-10 200 08:00:00.00 20

并且想要一个查询来获取这个结果集

and want a query to get this result set

Dt Dt_next customer_id buy_time money_spent ------------------------------------------------------------- 2000-01-04 2000-01-05 100 11:00:00.00 2 2000-01-05 2000-01-10 100 16:00:00.00 1 2000-01-10 NULL 100 13:00:00.00 4 2000-01-10 NULL 100 14:00:00.00 3 2000-01-04 2000-01-06 200 09:00:00.00 10 2000-01-06 2000-01-10 200 10:00:00.00 11 2000-01-06 2000-01-10 200 11:00:00.00 5 2000-01-10 NULL 200 08:00:00.00 20

即:我希望每个客户 (customer_id) 和每天 (Dt) 同一客户访问的第二天 (Dt_next)>).

That is: I want for each costumer (customer_id) and each day (Dt) the next day the same customer has visited (Dt_next).

我已经有一个查询提供后一个结果集(数据和查询包含在水平规则下方).然而，它涉及一个左外连接和两个dense_rank聚合函数.这种方法对我来说似乎有点笨拙，我认为应该有更好的解决方案.任何指向替代解决方案的指针都非常感谢！谢谢！

I have already one query that gives the latter result set (data and query enclosed below the horizontal rule). However, it involves a left outer join and two dense_rank aggregate functions. This approach seems a bit clumsy to me and I think that there should be a better solution. Any pointers to alternative solutions highly appreciated! Thank you!

顺便说一句:我使用的是 SQL Server 11 并且该表有 >>1m 个条目.

BTW: I am using SQL Server 11 and the table has >>1m entries.

我的查询:

select customer_table.Dt ,customer_table_lead.Dt as Dt_next ,customer_table.customer_id ,customer_table.buy_time ,customer_table.money_spent from ( select #customer_data.* ,dense_rank() over (partition by customer_id order by customer_id asc, Dt asc) as Dt_int from #customer_data ) as customer_table left outer join ( select distinct #customer_data.Dt ,#customer_data.customer_id ,dense_rank() over (partition by customer_id order by customer_id asc, Dt asc)-1 as Dt_int from #customer_data ) as customer_table_lead on ( customer_table.Dt_int=customer_table_lead.Dt_int and customer_table.customer_id=customer_table_lead.customer_id )

示例数据:

create table #customer_data ( Dt date not null, customer_id int not null, buy_time time(2) not null, money_spent float not null ); insert into #customer_data values ('2000-01-04',100,'11:00:00',2); insert into #customer_data values ('2000-01-05',100,'16:00:00',1); insert into #customer_data values ('2000-01-10',100,'13:00:00',4); insert into #customer_data values ('2000-01-10',100,'14:00:00',3); insert into #customer_data values ('2000-01-04',200,'09:00:00',10); insert into #customer_data values ('2000-01-06',200,'10:00:00',11); insert into #customer_data values ('2000-01-06',200,'11:00:00',5); insert into #customer_data values ('2000-01-10',200,'08:00:00',20);

推荐答案

试试这个查询:

select cd.Dt , t.Dt_next , cd.customer_id , cd.buy_time , cd.money_spent from ( select Dt , LEAD(Dt) OVER (PARTITION BY customer_id ORDER BY Dt) AS Dt_next , customer_id from ( select distinct Dt, customer_id from #customer_data ) t ) t inner join #customer_data cd on t.customer_id = cd.customer_id and t.Dt = cd.Dt

为什么字段 money_spent 有浮点型?您可能会遇到计算问题.将其转换为十进制类型.

Why field money_spent has float type? You may have problems with calculations. Convert it to decimal type.

相关文章