在 SQL Server 中计算运行总计

2021-12-01 00:00:00 sql cumulative-sum tsql sql-server

想象下表(称为TestTable):

id     somedate    somevalue
--     --------    ---------
45     01/Jan/09   3
23     08/Jan/09   5
12     02/Feb/09   0
77     14/Feb/09   7
39     20/Feb/09   34
33     02/Mar/09   6

我想要一个按日期顺序返回运行总计的查询，例如:

I would like a query that returns a running total in date order, like:

id somedate somevalue runningtotal -- -------- --------- ------------ 45 01/Jan/09 3 3 23 08/Jan/09 5 8 12 02/Feb/09 0 8 77 14/Feb/09 7 15 39 20/Feb/09 34 49 33 02/Mar/09 6 55

我知道有在 SQL Server 2000/2005/2008 中执行此操作的各种方法.

我对这种使用聚合集语句技巧的方法特别感兴趣:

I am particularly interested in this sort of method that uses the aggregating-set-statement trick:

INSERT INTO @AnotherTbl(id, somedate, somevalue, runningtotal) SELECT id, somedate, somevalue, null FROM TestTable ORDER BY somedate DECLARE @RunningTotal int SET @RunningTotal = 0 UPDATE @AnotherTbl SET @RunningTotal = runningtotal = @RunningTotal + somevalue FROM @AnotherTbl

...这是非常有效的，但我听说这方面存在一些问题，因为您不一定能保证 UPDATE 语句会以正确的顺序处理行.也许我们可以得到一些关于这个问题的明确答案.

... this is very efficient but I have heard there are issues around this because you can't necessarily guarantee that the UPDATE statement will process the rows in the correct order. Maybe we can get some definitive answers about that issue.

但也许人们可以提出其他建议?

But maybe there are other ways that people can suggest?

现在使用 SqlFiddle 与设置和更新技巧"示例上面

edit: Now with a SqlFiddle with the setup and the 'update trick' example above

推荐答案

更新，如果您运行的是 SQL Server 2012，请参阅:https://stackoverflow.com/a/10309947

Update, if you are running SQL Server 2012 see: https://stackoverflow.com/a/10309947

问题是Over子句的SQL Server实现是有些限制.

The problem is that the SQL Server implementation of the Over clause is somewhat limited.

Oracle(和 ANSI-SQL)允许您执行以下操作:

Oracle (and ANSI-SQL) allow you to do things like:

SELECT somedate, somevalue, SUM(somevalue) OVER(ORDER BY somedate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal FROM Table

SQL Server 没有为您提供针对此问题的干净解决方案.我的直觉告诉我，这是游标速度最快的罕见情况之一，尽管我必须对大结果进行一些基准测试.

SQL Server gives you no clean solution to this problem. My gut is telling me that this is one of those rare cases where a cursor is the fastest, though I will have to do some benchmarking on big results.

更新技巧很方便，但我觉得它相当脆弱.似乎如果您正在更新一个完整的表，那么它将按主键的顺序进行.因此，如果您将日期设置为主键升序，则可能 是安全的.但是您依赖于未记录的 SQL Server 实现细节(如果查询最终由两个进程执行，我想知道会发生什么，请参阅:MAXDOP):

The update trick is handy but I feel its fairly fragile. It seems that if you are updating a full table then it will proceed in the order of the primary key. So if you set your date as a primary key ascending you will probably be safe. But you are relying on an undocumented SQL Server implementation detail (also if the query ends up being performed by two procs I wonder what will happen, see: MAXDOP):

完整的工作示例:

drop table #t create table #t ( ord int primary key, total int, running_total int) insert #t(ord,total) values (2,20) -- notice the malicious re-ordering insert #t(ord,total) values (1,10) insert #t(ord,total) values (3,10) insert #t(ord,total) values (4,1) declare @total int set @total = 0 update #t set running_total = @total, @total = @total + total select * from #t order by ord ord total running_total ----------- ----------- ------------- 1 10 10 2 20 30 3 10 40 4 1 41

你要求一个基准，这是内幕.

You asked for a benchmark this is the lowdown.

最快的 SAFE 方法是 Cursor，它比交叉连接的相关子查询快一个数量级.

The fastest SAFE way of doing this would be the Cursor, it is an order of magnitude faster than the correlated sub-query of cross-join.

绝对最快的方法是 UPDATE 技巧.我唯一担心的是，我不确定在所有情况下更新都会以线性方式进行.查询中没有任何内容明确说明.

The absolute fastest way is the UPDATE trick. My only concern with it is that I am not certain that under all circumstances the update will proceed in a linear way. There is nothing in the query that explicitly says so.

最重要的是，对于生产代码，我会使用光标.

Bottom line, for production code I would go with the cursor.

测试数据:

create table #t ( ord int primary key, total int, running_total int) set nocount on declare @i int set @i = 0 begin tran while @i < 10000 begin insert #t (ord, total) values (@i, rand() * 100) set @i = @i +1 end commit

测试 1:

SELECT ord,total, (SELECT SUM(total) FROM #t b WHERE b.ord <= a.ord) AS b FROM #t a -- CPU 11731, Reads 154934, Duration 11135

测试 2:

SELECT a.ord, a.total, SUM(b.total) AS RunningTotal FROM #t a CROSS JOIN #t b WHERE (b.ord <= a.ord) GROUP BY a.ord,a.total ORDER BY a.ord -- CPU 16053, Reads 154935, Duration 4647

测试 3:

DECLARE @TotalTable table(ord int primary key, total int, running_total int) DECLARE forward_cursor CURSOR FAST_FORWARD FOR SELECT ord, total FROM #t ORDER BY ord OPEN forward_cursor DECLARE @running_total int, @ord int, @total int SET @running_total = 0 FETCH NEXT FROM forward_cursor INTO @ord, @total WHILE (@@FETCH_STATUS = 0) BEGIN SET @running_total = @running_total + @total INSERT @TotalTable VALUES(@ord, @total, @running_total) FETCH NEXT FROM forward_cursor INTO @ord, @total END CLOSE forward_cursor DEALLOCATE forward_cursor SELECT * FROM @TotalTable -- CPU 359, Reads 30392, Duration 496

测试 4:

declare @total int set @total = 0 update #t set running_total = @total, @total = @total + total select * from #t -- CPU 0, Reads 58, Duration 139

相关文章