如何添加“空"没有数据的每分钟记录到我在 SQL 服务器中的查询

2021-09-10 00:00:00 tsql sql-server

我的查询包含一个日期、一个时间(基本上是时间戳)和一个计算每小时补偿的字段.

My query consists of a date, a time (which basically is the timestamp) and a field that calculates the comp Per hour.

date        time      comp/H
---------- ----- ----------------------
2019-09-10 07:01 13640,416015625
2019-09-10 07:02 8970,3193359375
2019-09-10 07:03 6105,4990234375
2019-09-10 07:04 7189,77880859375
2019-09-10 07:08 2266,73657226563
2019-09-10 07:57 163,527984619141

我想填补时间戳之间的空白,并为没有分配任何数据的每一分钟添加一条新记录(例如,为 07:05、07:06、07:07 添加记录).我会为这些记录的 comp/h 字段分配一个 0 值,但我不知道如何做到这一点.

i would like to fill the gaps between the timestamps, and add a new record for each minute that didn't have any data assigned to it (for example, add record for 07:05, 07:06, 07:07) . I would assign a 0 value for the comp/h field for those records but i have no idea how to do this.

最终目标是制作上述数据的折线图,在其中可以直观地看到停机时间.(因此空记录"的值为 0)

Eventual Goal is to make a line graph of the data above, in which one could visually could see downtime. (hence the 0 values for the "empty records")

原始查询:

select cast(p_timestamp as date) as 'datum', CONVERT(VARCHAR(5), p_timestamp, 108) as 'time', avg(((AantalPCBperPaneel*(AantalCP+AantalQP))/deltasec)* 3600) as 'comp/h'
from Testview3
where p_timestamp > '2019-09-01' 
group by CONVERT(VARCHAR(5), p_timestamp, 108), cast(p_timestamp as date)
order by cast(p_timestamp as date) asc , CONVERT(VARCHAR(5), p_timestamp, 108) asc

推荐答案

您可以尝试以下代码:

填充模型场景

SET DATEFORMAT ymd;

DECLARE @mockTable TABLE([date] DATE,[time] TIME,[comp/H] DECIMAL(20,5));
INSERT INTO @mockTable VALUES
 ('2019-09-10','07:01',13640.416015625)
,('2019-09-10','07:02',8970.3193359375)
,('2019-09-10','07:03',6105.4990234375)
,('2019-09-10','07:04',7189.77880859375)
,('2019-09-10','07:08',2266.73657226563)
,('2019-09-10','07:57',163.527984619141);

--过滤到一天(只是为了保持这个简单...)

--Filter it to one day (only to keep this simple...)

DECLARE @TheDate DATE='20190910';

--查询

WITH CountMinutes(Nmbr) AS
(
    SELECT TOP((SELECT DATEDIFF(MINUTE,MIN([time]),MAX([time])) 
                FROM @mockTable 
                WHERE [date]=@TheDate)+1) ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1  
    FROM master..spt_values
)
SELECT @TheDate AS [date]
      ,CAST(DATEADD(MINUTE,mc.Nmbr,(SELECT MIN(t.[time]) FROM @mockTable t WHERE t.[date]=@TheDate)) AS TIME) AS [time]
      ,t2.[comp/H]
FROM CountMinutes mc 
LEFT JOIN @mockTable t2 ON t2.[date]=@TheDate AND t2.[time]=CAST(DATEADD(MINUTE,mc.Nmbr,(SELECT MIN(t.[time]) FROM @mockTable t WHERE t.[date]=@TheDate)) AS TIME);

简单的想法:

我们需要一个计数表,只是一个运行数字的列表.我使用 master..spt_values,它只不过是一个包含很多行的预填充表.您可以选择具有足够行数以覆盖该范围的任何现有表.我们不需要行的值,只需要集合的计数器.您还可以阅读计数表以及如何在VALUES()CROSS JOIN 的组合中创建它们.这里的神奇之处在于计算出的 TOP() 子句和 ROW_NUMBER() 的组合.

We need a tally table, just a list of running numbers. I use master..spt_values, which is nothing more than a pre-filled table with a lot of rows. You can pick any existing table with enough rows to cover the range. We do not need the row's values, only the counter for a set. You can also read about tally tables and how to create them in a combination of VALUES() and CROSS JOIN. The magic here is the combination of the computed TOP() clause and ROW_NUMBER().

因此 CTE 将返回反映分钟计数的数字列表.

So the CTE will return a list of numbers reflecting the count of minutes.

选择将使用此列表和 DATEADD() 创建一个无间隙时间值列表.现在我们必须LEFT JOIN你的集合来查看数据,哪里有数据...

The select will use this list and DATEADD() to create a gap-less list of time values. Now we have to LEFT JOIN your set to see data, where there is data...

在康斯坦丁苏尔科夫的回答下面的评论中,我说过,使用循环的计数器函数会非常慢.康斯坦丁让我衡量这个:

In a comment below Konstantin Surkov's answer I stated, that a counter function using a loop would be very slow. And Konstantin asked me to measure this:

这里我将比较三种方法

  • 康斯坦丁斯 LOOP-TVF
  • 一个简单的动态计数
  • 基于表格的方法

尝试一下:

USE master;
GO
CREATE DATABASE testCounter;
GO
USE testCounter;
GO

--Konstantins 使用 WHILE 的多语句 TVF

--Konstantins multi-statement TVF using a WHILE

create function rangeKonstantin(@from int, @to int) returns @table table(val int) as
begin
    while @from <= @to begin
        insert @table values(@from)
        set @from = @from + 1;
    end;
    return;
end;
GO

--使用 tally-on-the-fly 和 ROW_NUMBER()

create function rangeShnugo(@from int,@to int) returns table as
return
with cte1 AS(SELECT Nr FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(Nr))
    ,cte2 AS(SELECT c1.Nr FROM cte1 c1 CROSS JOIN cte1 c2)
    ,cte3 AS(SELECT c1.Nr FROM cte2 c1 CROSS JOIN cte2 c2)
    ,cte4 AS(SELECT c1.Nr FROM cte3 c1 CROSS JOIN cte3 c2)
select TOP(@to-@from+1) ROW_NUMBER() OvER(ORDER BY(SELECT NULL))+@from-1 AS val FROM cte4;
GO

--还有一个简单的静态数字表
--连同使用此表的函数

--And a simple static numbers table
--Together with a function using this table

CREATE TABLE persistantNumbers(val INT NOT NULL UNIQUE);
GO
--let's fill it 
INSERT INTO persistantNumbers SELECT val FROM rangeKonstantin(0,1500000) --1.5 mio rows
GO

create function rangeTable(@from int,@to int) returns table as
return
SELECT val FROM persistantNumbers WHERE val BETWEEN @from AND @to;
GO

--这里我们可以保存结果

--Here we can save the results

CREATE TABLE Result (ID INT IDENTITY,Measurement VARCHAR(100),TimeKonst INT, TimeShnugo INT, TimeTable INT, tmpCount INT)
GO

--您可以使用这些行来测试代码冷或保留注释以测试引擎缓存和使用统计信息的能力.

--You can use these lines to test the code cold or keep it out-commented to test the engine's ability of caching and using statistics.

--DBCC FREESESSIONCACHE
--DBCC FREEPROCCACHE
--DBCC DROPCLEANBUFFERS

--我们需要一个 DATETIME2 来获取动作之前的时刻

--We need a DATETIME2 to get the moment before the action

DECLARE @d DATETIME2; 

--以及具有可变部分的范围,以避免通过缓存结果产生任何偏差

--And a range with a variable part to avoid any bias through cached results

DECLARE @range INT=300 + (SELECT COUNT(*) FROM Result)

--现在让我们开始:简单计数到范围 x 范围

--Now let's start: Simple counting to range x range

SET @d=SYSUTCDATETIME();
SELECT * into tmp FROM rangeKonstantin(0,@range*@range);
INSERT INTO Result(Measurement,TimeKonst,tmpCount) SELECT 'a count to @range*@range',DATEDIFF(millisecond,@d,SYSUTCDATETIME()),(SELECT Count(*) FROM tmp);
DROP TABLE tmp;

SET @d=SYSUTCDATETIME();
SELECT * into tmp FROM rangeShnugo(0,@range*@range);
INSERT INTO Result(Measurement,TimeShnugo,tmpCount) SELECT 'a count to @range*@range',DATEDIFF(millisecond,@d,SYSUTCDATETIME()),(SELECT Count(*) FROM tmp);
DROP TABLE tmp;

SET @d=SYSUTCDATETIME();
SELECT * into tmp FROM rangeTable(0,@range*@range); 
INSERT INTO Result(Measurement,TimeTable,tmpCount) SELECT 'a count to @range*@range',DATEDIFF(millisecond,@d,SYSUTCDATETIME()),(SELECT Count(*) FROM tmp);
DROP TABLE tmp;

--并且 - 更重要 - 使用 APPLY 调用具有逐行更改参数的函数

--And - more important - using APPLY to call a function with a row-wise changing parameter

SET @d=SYSUTCDATETIME();
select h.val hour, m.val minute into tmp from rangeKonstantin(0, @range) h cross apply rangeKonstantin(0, h.val) m;
INSERT INTO Result(Measurement,TimeKonst,tmpCount) SELECT 'c @range apply',DATEDIFF(millisecond,@d,SYSUTCDATETIME()),(SELECT Count(*) FROM tmp);
DROP TABLE tmp;

SET @d=SYSUTCDATETIME();
select h.val hour, m.val minute into tmp from rangeShnugo(0, @range) h cross apply rangeShnugo(0, h.val) m;
INSERT INTO Result(Measurement,TimeShnugo,tmpCount) SELECT 'c @range apply',DATEDIFF(millisecond,@d,SYSUTCDATETIME()),(SELECT Count(*) FROM tmp);
DROP TABLE tmp;

SET @d=SYSUTCDATETIME();
select h.val hour, m.val minute into tmp from rangeTable(0, @range) h cross apply rangeTable(0, h.val) m;
INSERT INTO Result(Measurement,TimeTable,tmpCount) SELECT 'c @range apply',DATEDIFF(millisecond,@d,SYSUTCDATETIME()),(SELECT Count(*) FROM tmp);
DROP TABLE tmp;

--我们通过一个简单的GO 10

--We repeat the above 10 times by a simple GO 10

GO 10 --do the whole thing 10 times

--现在让我们获取结果

--Now let's fetch the results

SELECT Measurement
      ,AVG(TimeKonst) AS AvgKonst
      ,AVG(TimeShnugo) AS AvgShnugo
      ,AVG(TimeTable) AS AvgTable 
FROM Result 
GROUP BY Measurement;

SELECT * FROM Result ORDER BY Measurement,ID;

--清理

USE master;
GO
DROP DATABASE testCounter;

在强大的机器上运行的 v2014 上使用缓存和统计的 range=300 的结果:

The results for range=300 using caching and statistics on a v2014 running on a strong machine:

Measurement                 AvgKonst    AvgShnugo   AvgTable
a count to @range*@range    626         58          34
c @range apply              357         17          56

我们可以看到,带有 WHILE 的 TVF 比其他方法慢得多.

We can see, that the TVF with the WHILE is much slower than the other approaches.

在真实世界场景中,使用的范围(300 将计为 ~90k)相当小.在这里,我用 1000 的 @range 重复(计数超过 1 mio),仍然不是很大...

In a real-world-scenario the range used (300 will count to ~90k) is rather small. Here I repeated with a @range of 1000 (count goes over 1 mio), still not very big...

Measurement                 AvgKonst    AvgShnugo   AvgTable
a count to @range*@range    6800        418         321
c @range apply              3422        189         177

我们学到了什么:

  • 对于小范围计数,即时计数似乎最好
  • 当集合大小增加时,任何计数方法的扩展性都很差.
  • 基于表格的方法最适合大型集合.
  • 带有 WHILE 循环的多语句 TVF 无法正常工作.
  • For small-range counting the tally-on-the-fly seems best
  • Any counting approach scales badly when the set size increases.
  • The table-based approach is best with large sets.
  • The multi-statment TVF with a WHILE loop is not holding up.

在本地运行 SQL-Server 2017 的中型笔记本电脑上,对于 range=1000,我得到以下信息:

On a medium laptop with SQL-Server 2017 running locally I get the following for range=1000:

Measurement                 AvgKonst    AvgShnugo   AvgTablea 
count to @range*@range      10704       282         214
c @range apply              5671        1133        210

我们看到,使用更大的组合,桌子方法显然会获胜.

And we see, that with larger sets the table approach wins clearly.

值得一提的是:引擎会尝试预测行数以找到最佳计划.多语句 TVF 总是用一行来估计.一个简单的计数器也将被估计为一行.但是使用索引表,引擎将能够预测行并找到更好的计划.

And worth to mention: The engine tries to predict row counts in order to find the best plan. A multi-statement TVF is always estimated with just one row. A simple counter will be estimated with one row too. But with the indexed table the engine will be able to predict the rows and find a better plan.

相关文章