SQL 挑战/谜题:如何合并嵌套范围?
- 这项挑战基于一个涉及 IP 范围的真实用例.
- 我提供的解决方案基于 堆栈跟踪 我之前提出的挑战.每个范围的开始都被视为一个 PUSH 操作,每个范围的结束 + 1 被视为一个 POP 操作.
- This challenge is based on a real life use-case involving IP ranges.
- The solution I came with is based on the stack trace challenge I've presented previously. Each range start is treated as a PUSH operation and each range end + 1 is treated as a POP operation.
我们有一个范围数据集,其中每个范围都有一个起点、终点和一个值.
We have a data set of ranges where each range has a starting point, ending point and a value.
create table ranges
(
range_start int not null
,range_end int not null
,range_val char(1) not null
)
;
一个范围可以包含另一个范围或跟随另一个范围,但不能等于另一个范围或与另一个范围相交.
A range can contain another range or follow another range, but cannot be equal to another range or intersect with another range.
这些是范围之间的有效关系:
These are valid relations between ranges:
(1) (2) (3) (4)
--------- --------- --------- -------- -----------
--- --- ---
这些关系无效:
(5) (6)
------- --------
------- --------
当以图形方式呈现时,我们的初始范围可能如下所示(字母代表 range_val):
Our initial ranges, when presented graphically, might look something like this (The letter represents range_val):
AAAAAAAA BBCCCCCCC
DDE F GGGGG
H IIII
J
目标是采用初始范围集并根据以下规则创建新集:
The goal is to take the initial set of ranges and create a new set under the following rule:
包含的范围将覆盖包含范围的相应子范围.
请求的结果,当以图形方式呈现时,可能看起来像这样
The requested result, when presented graphically, might look something like this
ADDHAAAF BIIJIGCCC
要求
- 解决方案应该是单个 SQL 查询(子查询很好).
- 使用 T-SQL、PL/SQL 等.不允许.
- 不允许使用 UDF(用户定义函数).
AAAAAAAAAAAAAAAAAAAAAAAAAAAA BBBB CCCCCCCCCCCCCCCCCCCCCCCCC
DDDE FFFFFFFF GGGGGGGGG HHHHHHHH IIIIIII
JJ KKKLLL MM NN OOOOO
P QQ
insert into ranges (range_start,range_end,range_val) values (1 ,28 ,'A');
insert into ranges (range_start,range_end,range_val) values (31 ,34 ,'B');
insert into ranges (range_start,range_end,range_val) values (39 ,63 ,'C');
insert into ranges (range_start,range_end,range_val) values (1 ,3 ,'D');
insert into ranges (range_start,range_end,range_val) values (4 ,4 ,'E');
insert into ranges (range_start,range_end,range_val) values (7 ,14 ,'F');
insert into ranges (range_start,range_end,range_val) values (19 ,27 ,'G');
insert into ranges (range_start,range_end,range_val) values (43 ,50 ,'H');
insert into ranges (range_start,range_end,range_val) values (55 ,61 ,'I');
insert into ranges (range_start,range_end,range_val) values (1 ,2 ,'J');
insert into ranges (range_start,range_end,range_val) values (9 ,11 ,'K');
insert into ranges (range_start,range_end,range_val) values (12 ,14 ,'L');
insert into ranges (range_start,range_end,range_val) values (22 ,23 ,'M');
insert into ranges (range_start,range_end,range_val) values (25 ,26 ,'N');
insert into ranges (range_start,range_end,range_val) values (57 ,61 ,'O');
insert into ranges (range_start,range_end,range_val) values (13 ,13 ,'P');
insert into ranges (range_start,range_end,range_val) values (60 ,61 ,'Q');
请求的结果
(空值在此处显示为空格)
Requested Results
(Nulls are presented here as empty spaces)
JJDEAAFFKKKLPLAAAAGGGMMGNNGA BBBB CCCCHHHHHHHHCCCCIIOOOQQCC
range_start range_end range_val
----------- --------- ---------
1 2 J
3 3 D
4 4 E
5 6 A
7 8 F
9 11 K
12 12 L
13 13 P
14 14 L
15 18 A
19 21 G
22 23 M
24 24 G
25 26 N
27 27 G
28 28 A
29 30
31 34 B
35 38
39 42 C
43 50 H
51 54 C
55 56 I
57 59 O
60 61 Q
62 63 C
可选添加最后一行:
64
推荐答案
- 该解决方案基于 堆栈跟踪 我之前提出过的挑战.每个范围的开始都被视为一个 PUSH 操作,每个范围的结束 + 1 被视为一个 POP 操作.
- 在性能方面,您可能会注意到 2 个内部分析函数如何使用相同的窗口,因此可以在一个步骤中执行.
- The solution is based on the stack trace challenge I've presented previously. Each range start is treated as a PUSH operation and each range end + 1 is treated as a POP operation.
- Performence wise, you may notice how the 2 internal analytic functions use the same windowing, therefore being executed in a single step.
select new_range_start
,new_range_end
,last_value (range_val ignore nulls) over
(
partition by stack_depth
order by new_range_start ,range_start ,range_end desc
rows unbounded preceding
) as new_range_val
from (select new_range_start
,range_val
,range_start
,range_end
,sum (case when range_val is null then -1 else 1 end) over
(
order by new_range_start, range_start ,range_end desc
rows unbounded preceding
) as stack_depth
,min (new_range_start) over
(
order by new_range_start ,range_start ,range_end desc
rows between 1 following and 1 following
) - 1 as new_range_end
from ( select range_start ,range_start ,range_end ,range_val from ranges
union all select range_end + 1 ,range_start ,range_end ,cast (null as char(1)) from ranges
)
r (new_range_start,range_start,range_end,range_val)
)
r
qualify new_range_end >= new_range_start
order by new_range_start
;
甲骨文
select new_range_start
,new_range_end
,new_range_val
from (select new_range_start
,new_range_end
,last_value (range_val ignore nulls) over
(
partition by stack_depth
order by new_range_start ,range_start ,range_end desc
rows unbounded preceding
) as new_range_val
from (select new_range_start
,range_start
,range_end
,range_val
,sum (case when range_val is null then -1 else 1 end) over
(
order by new_range_start, range_start ,range_end desc
rows unbounded preceding
) as stack_depth
,lead (new_range_start) over
(
order by new_range_start, range_start ,range_end desc
) - 1 as new_range_end
from ( select range_start as new_range_start ,range_start ,range_end ,range_val from ranges
union all select range_end + 1 ,range_start ,range_end ,cast (null as char(1)) from ranges
)
r
)
r
)
r
where new_range_end >= new_range_start
order by new_range_start
;
SQL Server/PostgreSQL/Hive
select *
from (select new_range_start
,new_range_end
,min (range_val) over
(
partition by stack_depth,new_range_val_group_id
) as new_range_val
from (select new_range_start
,new_range_end
,range_val
,stack_depth
,count (range_val) over
(
partition by stack_depth
order by new_range_start ,range_start ,range_end desc
rows unbounded preceding
) as new_range_val_group_id
from (select new_range_start
,range_start
,range_end
,range_val
,sum (case when range_val is null then -1 else 1 end) over
(
order by new_range_start, range_start ,range_end desc
rows unbounded preceding
) as stack_depth
,lead (new_range_start) over
(
order by new_range_start, range_start ,range_end desc
) - 1 as new_range_end
from ( select range_start as new_range_start ,range_start ,range_end ,range_val from ranges
union all select range_end + 1 as new_range_start ,range_start ,range_end ,cast (null as char(1)) as range_val from ranges
)
r
)
r
)
r
)
r
where new_range_end >= new_range_start
order by new_range_start
;
相关文章