根据sql server中的相关值对数据进行分组
数据以 15 分钟为间隔:
<前>时间价值2010-01-01 00:15 32010-01-01 00:30 22010-01-01 00:45 42010-01-01 01:00 52010-01-01 01:15 12010-01-01 01:30 32010-01-01 01:45 42010-01-01 02:00 122010-01-01 02:15 132010-01-01 02:30 122010-01-01 02:45 142010-01-01 03:00 152010-01-01 03:15 32010-01-01 03:30 22010-01-01 03:45 32010-01-01 04:00 5...............2010-01-02 00:00通常会有 96 分.
根据数值,我们可能会注意到,00:15 到 01:45 之间的值彼此接近,02:00 到 03:00 之间的值彼此接近,而从 03:15 到04:00 他们靠得很近.
基于彼此接近"规则,我希望将数据分组"为 3 部分:
- 00:15 到 01:45
- 02:00 至 03:00
- 03:15 到 04:00
请注意数据可能是随机的,根据上面定义的规则可以分为 3 个以上的部分,但最多不应超过 10 个部分.并且分组必须遵循时间顺序,例如,您不能将 00:15/02:30/04:45 归为 1 组,因为这 3 个点不连续.
请给出一些如何在 t-sql 中实现它的想法.
更新:该值可能是:
<前>时间价值2010-01-01 00:15 32010-01-01 00:30 22010-01-01 00:45 42010-01-01 01:00 52010-01-01 01:15 12010-01-01 01:30 32010-01-01 01:45 42010-01-01 02:00 122010-01-01 02:15 132010-01-01 02:30 4 --突然减少2010-01-01 02:45 142010-01-01 03:00 152010-01-01 03:15 32010-01-01 03:30 22010-01-01 03:45 32010-01-01 04:00 5...............2010-01-02 00:00对于这种情况,我们不应该把 02:30 分开分组,因为我们希望分组大小必须至少为 3 点,我们会将那个点 (02:30) 放到前一组(从 02:00 到 03:00).
解决方案由于你的问题变化太大,这里是新问题的新答案,我只包括代码部分.
声明@t table(time datetime, value int)声明@variation浮动设置@variation = 2不计较插入@t 值('2010-01-01 00:15',3)插入@t 值('2010-01-01 00:30',2)插入@t 值('2010-01-01 00:45',4)插入@t 值('2010-01-01 01:00',5)插入@t 值('2010-01-01 01:15',1)插入@t 值('2010-01-01 01:30',3)插入@t 值('2010-01-01 01:45',4)插入@t 值('2010-01-01 02:00',52)插入@t 值('2010-01-01 02:15',5)插入@t 值('2010-01-01 02:30',52)插入@t 值('2010-01-01 02:45',54)插入@t 值('2010-01-01 03:00',55)插入@t 值('2010-01-01 03:15',3)插入@t 值('2010-01-01 03:30',2)插入@t 值('2010-01-01 03:45',3)插入@t 值('2010-01-01 04:00',5)声明@result 表(最小时间日期时间,最大时间日期时间)A:删除@result; t 为(select *, rn = row_number() over(order by time), log(value) lv from @t where datediff(day, time, '2010-01-01') = 0),作为(从 t 中选择时间、lv、rn、0 grp,其中 rn = 1联合所有选择 t1.time, a.lv, t1.rn,存在的情况(从 t t2 中选择 1 其中 t1.rn 介于 rn + 1 和 rn + 3 之间,并且lv 之间 t1.lv - @variation 和 t1.lv +@variation) 然后 grp else grp + 1 end从 t t1 加入 a ont1.rn = a.rn +1)插入@result通过 grp 从组中选择 min(time), max(time)如果@@rowcount >10开始设置@variation=@variation + .5转到一个结尾从@result中选择*
结果:
mintime maxtime2010-01-01 00:15:00.000 2010-01-01 01:45:00.0002010-01-01 02:00:00.000 2010-01-01 03:00:00.0002010-01-01 03:15:00.000 2010-01-01 04:00:00.000
the data is in 15 minute interval:
Time Value 2010-01-01 00:15 3 2010-01-01 00:30 2 2010-01-01 00:45 4 2010-01-01 01:00 5 2010-01-01 01:15 1 2010-01-01 01:30 3 2010-01-01 01:45 4 2010-01-01 02:00 12 2010-01-01 02:15 13 2010-01-01 02:30 12 2010-01-01 02:45 14 2010-01-01 03:00 15 2010-01-01 03:15 3 2010-01-01 03:30 2 2010-01-01 03:45 3 2010-01-01 04:00 5 .......... .......... .......... 2010-01-02 00:00
Typically there will be 96 points.
According to the values, we may notice that the values from 00:15 to 01:45 are close to each other, and from 02:00 to 03:00 they are close to each other, and from 03:15 to 04:00 they are close to each other.
Based on the "close to each other" rule, I want the data to be "grouped" into 3 parts:
- 00:15 to 01:45
- 02:00 to 03:00
- 03:15 to 04:00
Please consider that the data could be random, and could be grouped into more than 3 parts according to the rule defined above, but maximum should not be more than 10 parts. And the grouping must honor the time sequence, for example, you cannot just put 00:15/02:30/04:45 into 1 group because these 3 points are NOT consecutive.
Please give some thoughts how to implement it in t-sql.
updated: The value could be:
Time Value 2010-01-01 00:15 3 2010-01-01 00:30 2 2010-01-01 00:45 4 2010-01-01 01:00 5 2010-01-01 01:15 1 2010-01-01 01:30 3 2010-01-01 01:45 4 2010-01-01 02:00 12 2010-01-01 02:15 13 2010-01-01 02:30 4 --suddenly decreased 2010-01-01 02:45 14 2010-01-01 03:00 15 2010-01-01 03:15 3 2010-01-01 03:30 2 2010-01-01 03:45 3 2010-01-01 04:00 5 .......... .......... .......... 2010-01-02 00:00
for these kinds of situation, we should not group 02:30 separately, because we want the group size has to be at least 3 points, and we will put that point (02:30) to the previous group (from 02:00 to 03:00).
解决方案Since your question changed so much, here is a new answer to the new question, I only included the code part.
declare @t table(time datetime, value int)
declare @variation float
set @variation = 2
set nocount on
insert @t values('2010-01-01 00:15',3)
insert @t values('2010-01-01 00:30',2)
insert @t values('2010-01-01 00:45',4)
insert @t values('2010-01-01 01:00',5)
insert @t values('2010-01-01 01:15',1)
insert @t values('2010-01-01 01:30',3)
insert @t values('2010-01-01 01:45',4)
insert @t values('2010-01-01 02:00',52)
insert @t values('2010-01-01 02:15',5)
insert @t values('2010-01-01 02:30',52)
insert @t values('2010-01-01 02:45',54)
insert @t values('2010-01-01 03:00',55)
insert @t values('2010-01-01 03:15',3)
insert @t values('2010-01-01 03:30',2)
insert @t values('2010-01-01 03:45',3)
insert @t values('2010-01-01 04:00',5)
declare @result table(mintime datetime, maxtime datetime)
a:
delete @result
;with t as
(
select *, rn = row_number() over(order by time), log(value) lv from @t where datediff(day, time, '2010-01-01') = 0
), a as
(
select time, lv, rn, 0 grp from t where rn = 1
union all
select t1.time, a.lv, t1.rn,
case when exists (select 1 from t t2 where t1.rn between rn + 1 and rn + 3 and
lv between t1.lv - @variation and t1.lv +@variation) then grp else grp + 1 end
from t t1 join a on
t1.rn = a.rn +1
)
insert @result
select min(time), max(time) from a group by grp
if @@rowcount > 10
begin
set @variation=@variation + .5
goto a
end
select * from @result
Result:
mintime maxtime
2010-01-01 00:15:00.000 2010-01-01 01:45:00.000
2010-01-01 02:00:00.000 2010-01-01 03:00:00.000
2010-01-01 03:15:00.000 2010-01-01 04:00:00.000
相关文章