SQL 挑战/谜题:如何合并嵌套范围?

2021-12-28 00:00:00 sql hive teradata oracle sql-server
  • 这项挑战基于一个涉及 IP 范围的真实用例.
  • 我提供的解决方案基于 堆栈跟踪 我之前提出的挑战.每个范围的开始都被视为一个 PUSH 操作,每个范围的结束 + 1 被视为一个 POP 操作.
  • This challenge is based on a real life use-case involving IP ranges.
  • The solution I came with is based on the stack trace challenge I've presented previously. Each range start is treated as a PUSH operation and each range end + 1 is treated as a POP operation.

我们有一个范围数据集,其中每个范围都有一个起点、终点和一个值.

We have a data set of ranges where each range has a starting point, ending point and a value.

create table ranges
(
    range_start     int         not null
   ,range_end       int         not null
   ,range_val       char(1)     not null
)
;

一个范围可以包含另一个范围或跟随另一个范围,但不能等于另一个范围或与另一个范围相交.

A range can contain another range or follow another range, but cannot be equal to another range or intersect with another range.

这些是范围之间的有效关系:

These are valid relations between ranges:

(1)           (2)           (3)           (4)
---------     ---------     ---------     -------- -----------
---                 ---        ---

这些关系无效:

(5)                (6)
-------        --------       
-------              --------

当以图形方式呈现时,我们的初始范围可能如下所示(字母代表 range_val):

Our initial ranges, when presented graphically, might look something like this (The letter represents range_val):

AAAAAAAA  BBCCCCCCC
 DDE   F   GGGGG
   H       IIII
             J

目标是采用初始范围集并根据以下规则创建新集:

The goal is to take the initial set of ranges and create a new set under the following rule:

包含的范围将覆盖包含范围的相应子范围.

请求的结果,当以图形方式呈现时,可能看起来像这样

The requested result, when presented graphically, might look something like this

ADDHAAAF  BIIJIGCCC

要求

  • 解决方案应该是单个 SQL 查询(子查询很好).
  • 使用 T-SQL、PL/SQL 等.不允许.
  • 不允许使用 UDF(用户定义函数).
  • AAAAAAAAAAAAAAAAAAAAAAAAAAAA  BBBB    CCCCCCCCCCCCCCCCCCCCCCCCC
    DDDE  FFFFFFFF    GGGGGGGGG               HHHHHHHH    IIIIIII
    JJ      KKKLLL       MM NN                              OOOOO
                P                                              QQ
    
    insert into ranges (range_start,range_end,range_val) values (1  ,28 ,'A');
    insert into ranges (range_start,range_end,range_val) values (31 ,34 ,'B');
    insert into ranges (range_start,range_end,range_val) values (39 ,63 ,'C');
    insert into ranges (range_start,range_end,range_val) values (1  ,3  ,'D');
    insert into ranges (range_start,range_end,range_val) values (4  ,4  ,'E');
    insert into ranges (range_start,range_end,range_val) values (7  ,14 ,'F');
    insert into ranges (range_start,range_end,range_val) values (19 ,27 ,'G');
    insert into ranges (range_start,range_end,range_val) values (43 ,50 ,'H');
    insert into ranges (range_start,range_end,range_val) values (55 ,61 ,'I');
    insert into ranges (range_start,range_end,range_val) values (1  ,2  ,'J');
    insert into ranges (range_start,range_end,range_val) values (9  ,11 ,'K');
    insert into ranges (range_start,range_end,range_val) values (12 ,14 ,'L');
    insert into ranges (range_start,range_end,range_val) values (22 ,23 ,'M');
    insert into ranges (range_start,range_end,range_val) values (25 ,26 ,'N');
    insert into ranges (range_start,range_end,range_val) values (57 ,61 ,'O');
    insert into ranges (range_start,range_end,range_val) values (13 ,13 ,'P');
    insert into ranges (range_start,range_end,range_val) values (60 ,61 ,'Q');
    

    请求的结果

    (空值在此处显示为空格)

    Requested Results

    (Nulls are presented here as empty spaces)

    JJDEAAFFKKKLPLAAAAGGGMMGNNGA  BBBB    CCCCHHHHHHHHCCCCIIOOOQQCC
    
    range_start range_end range_val
    ----------- --------- ---------
    1           2          J
    3           3          D
    4           4          E
    5           6          A
    7           8          F
    9           11         K
    12          12         L
    13          13         P
    14          14         L
    15          18         A
    19          21         G
    22          23         M
    24          24         G
    25          26         N
    27          27         G
    28          28         A
    29          30         
    31          34         B
    35          38         
    39          42         C
    43          50         H
    51          54         C
    55          56         I
    57          59         O
    60          61         Q
    62          63         C
    

    可选添加最后一行:

    64
    

    推荐答案

    • 该解决方案基于 堆栈跟踪 我之前提出过的挑战.每个范围的开始都被视为一个 PUSH 操作,每个范围的结束 + 1 被视为一个 POP 操作.
    • 在性能方面,您可能会注意到 2 个内部分析函数如何使用相同的窗口,因此可以在一个步骤中执行.
      • The solution is based on the stack trace challenge I've presented previously. Each range start is treated as a PUSH operation and each range end + 1 is treated as a POP operation.
      • Performence wise, you may notice how the 2 internal analytic functions use the same windowing, therefore being executed in a single step.
      • select      new_range_start
                   ,new_range_end
        
                   ,last_value (range_val ignore nulls) over 
                    (
                        partition by    stack_depth
                        order by        new_range_start ,range_start ,range_end desc 
                        rows            unbounded preceding
                    )                                                                   as new_range_val
        
        from       (select      new_range_start
                               ,range_val
                               ,range_start
                               ,range_end
        
                               ,sum (case when range_val is null then -1 else 1 end) over 
                                (
                                    order by    new_range_start, range_start ,range_end desc  
                                    rows        unbounded preceding
                                )                                                                   as stack_depth
        
                               ,min (new_range_start) over
                                (
                                    order by    new_range_start ,range_start ,range_end desc
                                    rows        between 1 following and 1 following
        
                                ) - 1                                                               as new_range_end
        
                    from        (           select range_start     ,range_start ,range_end ,range_val              from ranges
                                union all   select range_end   + 1 ,range_start ,range_end ,cast (null as char(1)) from ranges
                                )
                                r (new_range_start,range_start,range_end,range_val)
                    )
                    r
        
        qualify     new_range_end >= new_range_start
        
        order by    new_range_start
        ;
        

        甲骨文

        select      new_range_start
                   ,new_range_end
                   ,new_range_val                       
        
        from       (select      new_range_start
                               ,new_range_end
        
                               ,last_value (range_val ignore nulls) over 
                                (
                                    partition by    stack_depth
                                    order by        new_range_start ,range_start ,range_end desc 
                                    rows            unbounded preceding
                                )                                                                   as new_range_val
        
        
                    from       (select      new_range_start
                                           ,range_start
                                           ,range_end
                                           ,range_val
        
                                           ,sum (case when range_val is null then -1 else 1 end) over 
                                            (
                                                order by    new_range_start, range_start ,range_end desc  
                                                rows        unbounded preceding
                                            )                                                                as stack_depth
        
                                           ,lead (new_range_start) over
                                            (
                                                order by    new_range_start, range_start ,range_end desc 
                                            ) - 1                                                            as new_range_end
        
                                from        (           select range_start     as new_range_start ,range_start ,range_end ,range_val              from ranges
                                            union all   select range_end   + 1                    ,range_start ,range_end ,cast (null as char(1)) from ranges
                                            )
                                            r 
                                )
                                r
                    )
                    r
        
        where       new_range_end >= new_range_start
        
        order by    new_range_start
        ;
        

        SQL Server/PostgreSQL/Hive

        select      *
        
        from       (select      new_range_start
                               ,new_range_end
                               ,min (range_val) over
                                (
                                    partition by    stack_depth,new_range_val_group_id
                                )                                                       as new_range_val                       
        
                    from       (select      new_range_start
                                           ,new_range_end
                                           ,range_val
                                           ,stack_depth
        
                                           ,count (range_val) over 
                                            (
                                                partition by    stack_depth
                                                order by        new_range_start ,range_start ,range_end desc 
                                                rows            unbounded preceding
                                            )                                                                   as new_range_val_group_id
        
        
                                from       (select      new_range_start
                                                       ,range_start
                                                       ,range_end
                                                       ,range_val
        
                                                       ,sum (case when range_val is null then -1 else 1 end) over 
                                                        (
                                                            order by    new_range_start, range_start ,range_end desc  
                                                            rows        unbounded preceding
                                                        )                                                                as stack_depth
        
                                                       ,lead (new_range_start) over
                                                        (
                                                            order by    new_range_start, range_start ,range_end desc 
                                                        ) - 1                                                            as new_range_end
        
                                            from        (           select range_start     as new_range_start ,range_start ,range_end ,range_val                           from ranges
                                                        union all   select range_end   + 1 as new_range_start ,range_start ,range_end ,cast (null as char(1)) as range_val from ranges
                                                        )
                                                        r 
                                            )
                                            r
                                )
                                r
                    )
                    r
        
        where       new_range_end >= new_range_start
        
        order by    new_range_start
        ;
        

相关文章