在一条 SQL 记录中查找并发用户数

2021-09-10 00:00:00 tsql sql-server

我有以下结构的表格:

UserID   StartedOn          EndedOn
1        2009-7-12T14:01    2009-7-12T15:01 
2        2009-7-12T14:30    2009-7-12T14:45
3        2009-7-12T14:47    2009-7-12T15:30
4        2009-7-12T13:01    2009-7-12T17:01
5        2009-7-12T14:15    2009-7-12T18:01
6        2009-7-12T11:01    2009-7-12T19:01
1        2009-7-12T16:07    2009-7-12T19:01

我需要找到在线的最大并发用户数.在上表中,结果为 5,因为用户 set1={1,2,4,5,6} 和 set2={1,3,4,5,6} 在同一时期在线.

I need to find the maximal number of concurrent users that were on line. In the above table the result would be 5 because users set1={1,2,4,5,6} and set2={1,3,4,5,6} were online in the same period.

你知道如何仅使用 T-SQL 来计算这个吗?

Do you have an idea how one could calculate this using T-SQL only?

推荐答案

显然并发用户数只会在用户开始或结束一个时期时发生变化,因此确定开始和结束期间的并发用户数就足够了.所以,重用 Remus 提供的测试数据(谢谢 Remus):

Clearly the number of concurrent users only changes when a user either starts or ends a period, so it is enough to determine the number of concurrent users during starts and ends. So, reusing test data provided by Remus (thank you Remus):

DECLARE @Table TABLE 
(
  UserId int, 
  StartedOn datetime,
  EndedOn datetime
);

insert into @table (UserId, startedOn, EndedOn)
select 1, '2009-7-12 14:01', '2009-7-12 15:01'
union all select 2, '2009-7-12 14:30', '2009-7-12 14:45'
union all select 3, '2009-7-12 14:47', '2009-7-12 15:30'
union all select 4, '2009-7-12 13:01', '2009-7-12 17:01'
union all select 5, '2009-7-12 14:15', '2009-7-12 18:01'
union all select 6, '2009-7-12 11:01', '2009-7-12 19:01'
union all select 1, '2009-7-12 16:07', '2009-7-12 19:01';

SELECT MAX(ConcurrentUsers) FROM(
SELECT COUNT(*) AS ConcurrentUsers FROM @table AS Sessions 
JOIN 
(SELECT DISTINCT StartedOn AS ChangeTime FROM @table
) AS ChangeTimes
ON ChangeTime >= StartedOn AND ChangeTime < EndedOn 
GROUP BY ChangeTime
) AS ConcurrencyAtChangeTimes
-------
5

顺便说一句,使用 DISTINCT 本身并不是一个错误——只有滥用 DISTINCT 才是.DISTINCT 只是一个工具,在这种情况下使用它是完全正确的.

BTW using DISTINCT per se is not a mistake - only abusing DISTINCT is. DISTINCT is just a tool, using it in this context is perfectly correct.

我正在回答 OP 的问题:如何仅使用 T-SQL 来计算".请注意,该问题并未提及性能.

I was answering the OP's question: "how one could calculate this using T-SQL only". Note that the question does not mention performance.

如果问题是这样的:如果数据存储在 SQL Server 中,确定最大并发的最快方法是什么",我会提供不同的答案,如下所示:

If the questions was this: "what is the fastest way to determine maximum concurrency if the data is stored in SQL Server", I would provide a different answer, something like this:

考虑以下替代方案

  1. 写游标
  2. 编写一个 CLR 游标
  3. 在客户端写一个循环
  4. 使用具有合适游标的 RDBMS,例如 Oracle 或 PostgreSql
  5. 为了获得最佳性能,请以不同的方式设计您的表格,以便您可以在一次索引查找中检索答案.如果我需要提供最佳性能,这就是我在我的系统中所做的.

如果问题是使用 T-SQL 查询确定最大并发的最快方法是什么",我可能根本不会回答.原因是:如果我需要非常好的性能,我不会在 T-SQL 查询中解决这个问题.

If the question was "what is the fastest way to determine maximum concurrency using a T-SQL query", I would probably not answer at all. The reason: if I needed really good performance, I would not solve this problem in a T-SQL query.

相关文章