在一条 SQL 记录中查找并发用户数
我有以下结构的表格:
UserID StartedOn EndedOn
1 2009-7-12T14:01 2009-7-12T15:01
2 2009-7-12T14:30 2009-7-12T14:45
3 2009-7-12T14:47 2009-7-12T15:30
4 2009-7-12T13:01 2009-7-12T17:01
5 2009-7-12T14:15 2009-7-12T18:01
6 2009-7-12T11:01 2009-7-12T19:01
1 2009-7-12T16:07 2009-7-12T19:01
我需要找到在线的最大并发用户数.在上表中,结果为 5,因为用户 set1={1,2,4,5,6} 和 set2={1,3,4,5,6} 在同一时期在线.
I need to find the maximal number of concurrent users that were on line. In the above table the result would be 5 because users set1={1,2,4,5,6} and set2={1,3,4,5,6} were online in the same period.
你知道如何仅使用 T-SQL 来计算这个吗?
Do you have an idea how one could calculate this using T-SQL only?
推荐答案
显然并发用户数只会在用户开始或结束一个时期时发生变化,因此确定开始和结束期间的并发用户数就足够了.所以,重用 Remus 提供的测试数据(谢谢 Remus):
Clearly the number of concurrent users only changes when a user either starts or ends a period, so it is enough to determine the number of concurrent users during starts and ends. So, reusing test data provided by Remus (thank you Remus):
DECLARE @Table TABLE
(
UserId int,
StartedOn datetime,
EndedOn datetime
);
insert into @table (UserId, startedOn, EndedOn)
select 1, '2009-7-12 14:01', '2009-7-12 15:01'
union all select 2, '2009-7-12 14:30', '2009-7-12 14:45'
union all select 3, '2009-7-12 14:47', '2009-7-12 15:30'
union all select 4, '2009-7-12 13:01', '2009-7-12 17:01'
union all select 5, '2009-7-12 14:15', '2009-7-12 18:01'
union all select 6, '2009-7-12 11:01', '2009-7-12 19:01'
union all select 1, '2009-7-12 16:07', '2009-7-12 19:01';
SELECT MAX(ConcurrentUsers) FROM(
SELECT COUNT(*) AS ConcurrentUsers FROM @table AS Sessions
JOIN
(SELECT DISTINCT StartedOn AS ChangeTime FROM @table
) AS ChangeTimes
ON ChangeTime >= StartedOn AND ChangeTime < EndedOn
GROUP BY ChangeTime
) AS ConcurrencyAtChangeTimes
-------
5
顺便说一句,使用 DISTINCT 本身并不是一个错误——只有滥用 DISTINCT 才是.DISTINCT 只是一个工具,在这种情况下使用它是完全正确的.
BTW using DISTINCT per se is not a mistake - only abusing DISTINCT is. DISTINCT is just a tool, using it in this context is perfectly correct.
我正在回答 OP 的问题:如何仅使用 T-SQL 来计算".请注意,该问题并未提及性能.
I was answering the OP's question: "how one could calculate this using T-SQL only". Note that the question does not mention performance.
如果问题是这样的:如果数据存储在 SQL Server 中,确定最大并发的最快方法是什么",我会提供不同的答案,如下所示:
If the questions was this: "what is the fastest way to determine maximum concurrency if the data is stored in SQL Server", I would provide a different answer, something like this:
考虑以下替代方案
- 写游标
- 编写一个 CLR 游标
- 在客户端写一个循环
- 使用具有合适游标的 RDBMS,例如 Oracle 或 PostgreSql
- 为了获得最佳性能,请以不同的方式设计您的表格,以便您可以在一次索引查找中检索答案.如果我需要提供最佳性能,这就是我在我的系统中所做的.
如果问题是使用 T-SQL 查询确定最大并发的最快方法是什么",我可能根本不会回答.原因是:如果我需要非常好的性能,我不会在 T-SQL 查询中解决这个问题.
If the question was "what is the fastest way to determine maximum concurrency using a T-SQL query", I would probably not answer at all. The reason: if I needed really good performance, I would not solve this problem in a T-SQL query.
相关文章