在 SQL Server 中计算运行总计
想象下表(称为TestTable
):
id somedate somevalue
-- -------- ---------
45 01/Jan/09 3
23 08/Jan/09 5
12 02/Feb/09 0
77 14/Feb/09 7
39 20/Feb/09 34
33 02/Mar/09 6
我想要一个按日期顺序返回运行总计的查询,例如:
I would like a query that returns a running total in date order, like:
id somedate somevalue runningtotal
-- -------- --------- ------------
45 01/Jan/09 3 3
23 08/Jan/09 5 8
12 02/Feb/09 0 8
77 14/Feb/09 7 15
39 20/Feb/09 34 49
33 02/Mar/09 6 55
我知道有 在 SQL Server 2000/2005/2008 中执行此操作的各种方法.
我对这种使用聚合集语句技巧的方法特别感兴趣:
I am particularly interested in this sort of method that uses the aggregating-set-statement trick:
INSERT INTO @AnotherTbl(id, somedate, somevalue, runningtotal)
SELECT id, somedate, somevalue, null
FROM TestTable
ORDER BY somedate
DECLARE @RunningTotal int
SET @RunningTotal = 0
UPDATE @AnotherTbl
SET @RunningTotal = runningtotal = @RunningTotal + somevalue
FROM @AnotherTbl
...这是非常有效的,但我听说这方面存在一些问题,因为您不一定能保证 UPDATE
语句会以正确的顺序处理行.也许我们可以得到一些关于这个问题的明确答案.
... this is very efficient but I have heard there are issues around this because you can't necessarily guarantee that the UPDATE
statement will process the rows in the correct order. Maybe we can get some definitive answers about that issue.
但也许人们可以提出其他建议?
But maybe there are other ways that people can suggest?
现在使用 SqlFiddle 与设置和更新技巧"示例上面
edit: Now with a SqlFiddle with the setup and the 'update trick' example above
推荐答案
更新,如果您运行的是 SQL Server 2012,请参阅:https://stackoverflow.com/a/10309947
Update, if you are running SQL Server 2012 see: https://stackoverflow.com/a/10309947
问题是Over子句的SQL Server实现是有些限制.
The problem is that the SQL Server implementation of the Over clause is somewhat limited.
Oracle(和 ANSI-SQL)允许您执行以下操作:
Oracle (and ANSI-SQL) allow you to do things like:
SELECT somedate, somevalue,
SUM(somevalue) OVER(ORDER BY somedate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal
FROM Table
SQL Server 没有为您提供针对此问题的干净解决方案.我的直觉告诉我,这是游标速度最快的罕见情况之一,尽管我必须对大结果进行一些基准测试.
SQL Server gives you no clean solution to this problem. My gut is telling me that this is one of those rare cases where a cursor is the fastest, though I will have to do some benchmarking on big results.
更新技巧很方便,但我觉得它相当脆弱.似乎如果您正在更新一个完整的表,那么它将按主键的顺序进行.因此,如果您将日期设置为主键升序,则可能
是安全的.但是您依赖于未记录的 SQL Server 实现细节(如果查询最终由两个进程执行,我想知道会发生什么,请参阅:MAXDOP):
The update trick is handy but I feel its fairly fragile. It seems that if you are updating a full table then it will proceed in the order of the primary key. So if you set your date as a primary key ascending you will probably
be safe. But you are relying on an undocumented SQL Server implementation detail (also if the query ends up being performed by two procs I wonder what will happen, see: MAXDOP):
完整的工作示例:
drop table #t
create table #t ( ord int primary key, total int, running_total int)
insert #t(ord,total) values (2,20)
-- notice the malicious re-ordering
insert #t(ord,total) values (1,10)
insert #t(ord,total) values (3,10)
insert #t(ord,total) values (4,1)
declare @total int
set @total = 0
update #t set running_total = @total, @total = @total + total
select * from #t
order by ord
ord total running_total
----------- ----------- -------------
1 10 10
2 20 30
3 10 40
4 1 41
你要求一个基准,这是内幕.
You asked for a benchmark this is the lowdown.
最快的 SAFE 方法是 Cursor,它比交叉连接的相关子查询快一个数量级.
The fastest SAFE way of doing this would be the Cursor, it is an order of magnitude faster than the correlated sub-query of cross-join.
绝对最快的方法是 UPDATE 技巧.我唯一担心的是,我不确定在所有情况下更新都会以线性方式进行.查询中没有任何内容明确说明.
The absolute fastest way is the UPDATE trick. My only concern with it is that I am not certain that under all circumstances the update will proceed in a linear way. There is nothing in the query that explicitly says so.
最重要的是,对于生产代码,我会使用光标.
Bottom line, for production code I would go with the cursor.
测试数据:
create table #t ( ord int primary key, total int, running_total int)
set nocount on
declare @i int
set @i = 0
begin tran
while @i < 10000
begin
insert #t (ord, total) values (@i, rand() * 100)
set @i = @i +1
end
commit
测试 1:
SELECT ord,total,
(SELECT SUM(total)
FROM #t b
WHERE b.ord <= a.ord) AS b
FROM #t a
-- CPU 11731, Reads 154934, Duration 11135
测试 2:
SELECT a.ord, a.total, SUM(b.total) AS RunningTotal
FROM #t a CROSS JOIN #t b
WHERE (b.ord <= a.ord)
GROUP BY a.ord,a.total
ORDER BY a.ord
-- CPU 16053, Reads 154935, Duration 4647
测试 3:
DECLARE @TotalTable table(ord int primary key, total int, running_total int)
DECLARE forward_cursor CURSOR FAST_FORWARD
FOR
SELECT ord, total
FROM #t
ORDER BY ord
OPEN forward_cursor
DECLARE @running_total int,
@ord int,
@total int
SET @running_total = 0
FETCH NEXT FROM forward_cursor INTO @ord, @total
WHILE (@@FETCH_STATUS = 0)
BEGIN
SET @running_total = @running_total + @total
INSERT @TotalTable VALUES(@ord, @total, @running_total)
FETCH NEXT FROM forward_cursor INTO @ord, @total
END
CLOSE forward_cursor
DEALLOCATE forward_cursor
SELECT * FROM @TotalTable
-- CPU 359, Reads 30392, Duration 496
测试 4:
declare @total int
set @total = 0
update #t set running_total = @total, @total = @total + total
select * from #t
-- CPU 0, Reads 58, Duration 139
相关文章