在一条 SQL 记录中查找并发用户数

2021-09-10 00:00:00 tsql sql-server

我有以下结构的表格:

UserID   StartedOn          EndedOn
1        2009-7-12T14:01    2009-7-12T15:01 
2        2009-7-12T14:30    2009-7-12T14:45
3        2009-7-12T14:47    2009-7-12T15:30
4        2009-7-12T13:01    2009-7-12T17:01
5        2009-7-12T14:15    2009-7-12T18:01
6        2009-7-12T11:01    2009-7-12T19:01
1        2009-7-12T16:07    2009-7-12T19:01

我需要找到在线的最大并发用户数.在上表中，结果为 5，因为用户 set1={1,2,4,5,6} 和 set2={1,3,4,5,6} 在同一时期在线.

I need to find the maximal number of concurrent users that were on line. In the above table the result would be 5 because users set1={1,2,4,5,6} and set2={1,3,4,5,6} were online in the same period.

你知道如何仅使用 T-SQL 来计算这个吗?

Do you have an idea how one could calculate this using T-SQL only?

推荐答案

显然并发用户数只会在用户开始或结束一个时期时发生变化，因此确定开始和结束期间的并发用户数就足够了.所以，重用 Remus 提供的测试数据(谢谢 Remus):

Clearly the number of concurrent users only changes when a user either starts or ends a period, so it is enough to determine the number of concurrent users during starts and ends. So, reusing test data provided by Remus (thank you Remus):

DECLARE @Table TABLE ( UserId int, StartedOn datetime, EndedOn datetime ); insert into @table (UserId, startedOn, EndedOn) select 1, '2009-7-12 14:01', '2009-7-12 15:01' union all select 2, '2009-7-12 14:30', '2009-7-12 14:45' union all select 3, '2009-7-12 14:47', '2009-7-12 15:30' union all select 4, '2009-7-12 13:01', '2009-7-12 17:01' union all select 5, '2009-7-12 14:15', '2009-7-12 18:01' union all select 6, '2009-7-12 11:01', '2009-7-12 19:01' union all select 1, '2009-7-12 16:07', '2009-7-12 19:01'; SELECT MAX(ConcurrentUsers) FROM( SELECT COUNT(*) AS ConcurrentUsers FROM @table AS Sessions JOIN (SELECT DISTINCT StartedOn AS ChangeTime FROM @table ) AS ChangeTimes ON ChangeTime >= StartedOn AND ChangeTime < EndedOn GROUP BY ChangeTime ) AS ConcurrencyAtChangeTimes ------- 5

顺便说一句，使用 DISTINCT 本身并不是一个错误——只有滥用 DISTINCT 才是.DISTINCT 只是一个工具，在这种情况下使用它是完全正确的.

BTW using DISTINCT per se is not a mistake - only abusing DISTINCT is. DISTINCT is just a tool, using it in this context is perfectly correct.

我正在回答 OP 的问题:如何仅使用 T-SQL 来计算".请注意，该问题并未提及性能.

I was answering the OP's question: "how one could calculate this using T-SQL only". Note that the question does not mention performance.

如果问题是这样的:如果数据存储在 SQL Server 中，确定最大并发的最快方法是什么"，我会提供不同的答案，如下所示:

If the questions was this: "what is the fastest way to determine maximum concurrency if the data is stored in SQL Server", I would provide a different answer, something like this:

考虑以下替代方案

写游标
编写一个 CLR 游标
在客户端写一个循环
使用具有合适游标的 RDBMS，例如 Oracle 或 PostgreSql
为了获得最佳性能，请以不同的方式设计您的表格，以便您可以在一次索引查找中检索答案.如果我需要提供最佳性能，这就是我在我的系统中所做的.

如果问题是使用 T-SQL 查询确定最大并发的最快方法是什么"，我可能根本不会回答.原因是:如果我需要非常好的性能，我不会在 T-SQL 查询中解决这个问题.

If the question was "what is the fastest way to determine maximum concurrency using a T-SQL query", I would probably not answer at all. The reason: if I needed really good performance, I would not solve this problem in a T-SQL query.

相关文章