与多个表的分层查询匹配具有挑战性

2021-09-10 00:00:00 sql tsql sql-server

我有这个 business_table

I have this business_table

ref_ID      name    parent_id 
-----------------------------
ABC-0001    Amb     NULL 
PQR-899     boss    NULL
tgv-632     pick    NULL
yyy-888     xyz     NULL
kkk-456     ued     NULL

我想更新 business_table 的 parent_id

I want to update parent_id of business_table

parent_customer 是另一个表,列出了下面给出的 ref_ID 和 parent_id 的层次结构.

parent_customer is another table which list the hierarchy of ref_ID and parent_id given below.

更新business_table staps 的parent_id 是

To update the parent_id of business_table staps are

1) 检查 business_table 的 ref_id 和 parent_customer 的 ref_id.例如.business_table 的 ref_ID ABC-0001 与 parent_customer ref_id 第一行 1 ref_id-ABC-0001 opr-656 匹配找到匹配

1) check ref_id of business_table with ref_id of parent_customer . eg. ref_ID ABC-0001 of business_table match with parent_customer ref_id 1st row 1 ref_id-ABC-0001 opr-656 match found

2) 然后检查该匹配记录的 parent_customer 的 parent_id,在这种情况下是 parent_id opr-656 检查 match_table_CM 表

2) then check parent_id of parent_customer of that matched record which is in this case parent_id opr-656 check with match_table_CM table

match_table_CM 表在更新记录之前列出了我们想要匹配的 id(我们正在检查这个,因为这是 CRM id 需要检查 emplpoyee 是否存在)

match_table_CM table list the ids which we want to match before updating record (we are checking this because of this is CRM id need to check emplpoyee exist of not)

3) 未找到匹配项,然后检查 parent_customer 的 parent_id opr-656 与同一个表 parent_customer ref_id ,找到 ref_id opr-656 的第二条记录然后选择它的 parent_id ttK-668 检查与 match_table_CM 匹配找到 1 ttK-668 然后用 business_table parent_id 更新其他明智的检查,直到parent_customer ref_ID = parent_id (parent of all) 并更新该 id 即使没有找到匹配所以在这种情况下如果没有找到匹配那么 ttK-668 应该是终于更新了

3)match not found then check with parent_id opr-656 of parent_customer with same table parent_customer ref_id , 2nd record found with ref_id opr-656 then pick its parent_id ttK-668 check with match_table_CM match found 1 ttK-668 then update with business_table parent_id other wise check till the parent_customer ref_ID = parent_id (parent of all) and update that id even if match not found so in this case if match not found then ttK-668 is should be updated at last

注意:- parent_customer 表列出了一个数据层次结构,其中当 ref_id 和 parent_id 相同时,表示它是整个层次结构的父级.

note : - parent_customer table lists a hierarchy of data in which when both ref_id and parent_id are the same means it's the parent of the entire hierarchy.

例如:

4 PQR-899 PQR-899 这是层次结构的最终父级

4 PQR-899 PQR-899 this is ultimate parent of hierarchy

父客户

ID  ref_id     parent_id  
---------------------------
1   ABC-0001   opr-656
2   opr-656    ttK-668
3   ttK-668    ttK-668
4   PQR-899    PQR-899
5   kkk-565    AJY-567  
6   AJY-567    UXO-989
7   UXO-989    tgv-632
8   tgv-632    mnb-784 
9   mnb-784    qwe-525 
10  qwe-525    qwe-525
11  kkk-456    jjj-888

match_table_CM:

match_table_CM:

id    main_id
--------------
1     ttK-668
2     PQR-899
3     tgv-632
4     mnb-784

预期输出

ref_ID      name    parent_id 
-----------------------------
ABC-0001    Amb     ttK-668                    
PQR-899     boss    PQR-899
tgv-632     pick    qwe-525
yyy-888     xyz     NULL
kkk-456     ued     jjj-888

推荐答案

这应该返回预期的结果:

This should return the expected result:

WITH hierarchy AS
 ( -- all rows from source table
   SELECT b.ref_id, pc.parent_id, 
      0 AS match,
      1 AS lvl
   FROM business_table AS b
   LEFT JOIN parent_customer AS pc
     ON b.ref_id = pc.ref_id

   UNION ALL

   SELECT h.ref_id, pc.parent_id, 
      -- check if we found a match or reached top of hierarchy
      CASE WHEN mt.main_id IS NOT NULL OR pc.parent_id = pc.ref_id THEN 1 ELSE 0 END,
      lvl+1
   FROM hierarchy AS h
   JOIN parent_customer AS pc 
     ON pc.ref_id = h.parent_id -- going up in the hierarchy
   LEFT JOIN match_table_CM AS mt
     ON mt.main_id = pc.ref_id
   WHERE h.match = 0 -- no match yet
     AND lvl < 10 -- just in case there's an endless loop due to bad data
 )
SELECT * FROM hierarchy AS h
WHERE lvl = 
 ( -- return the last row, matching or not
   SELECT Max(lvl)
   FROM hierarchy AS h2
   WHERE h.ref_id = h2.ref_id
 );

使用 EXISTS 重写,因为 SQL Server 在递归部分不支持外部联接:

Rewrite using EXISTS because SQL Server doesn't support Outer Joins in the recursive part:

WITH hierarchy AS
 ( -- all rows from source table
   SELECT b.ref_id, pc.parent_id, 
      0 AS match,
      1 AS lvl
   FROM business_table AS b
   LEFT JOIN parent_customer AS pc
     ON b.ref_id = pc.ref_id

   UNION ALL

   SELECT h.ref_id, pc.parent_id, 
      -- check if we found a match or reached top of hierarchy
      CASE WHEN exists
            ( select * 
              from match_table_CM AS mt
              where mt.main_id = pc.ref_id
            ) OR pc.parent_id = pc.ref_id
           THEN 1
           ELSE 0
      END,
      lvl+1
   FROM hierarchy AS h
   JOIN parent_customer AS pc 
     ON pc.ref_id = h.parent_id -- going up in the hierarchy
   WHERE h.match = 0 -- no match yet
     AND lvl < 10 -- just in case there's an endless loop due to bad data
 )
SELECT * FROM hierarchy AS h
WHERE lvl = 
 ( -- return the last row, matching or not
   SELECT Max(lvl)
   FROM hierarchy AS h2
   WHERE h.ref_id = h2.ref_id
 );

优化器的计划看起来很糟糕,因此再次重写以使用窗口聚合而不是相关子查询:

The optimizer's plan looked bad, so another rewrite to use a Windowed Aggregate instead of a Correlated Subquery:

WITH  hierarchy AS
 ( -- all rows from source table
   SELECT b.ref_id, pc.parent_id, 
      0 AS match,
      1 AS lvl
   FROM business_table AS b
   LEFT JOIN parent_customer AS pc
     ON b.ref_id = pc.ref_id

   UNION ALL

   SELECT h.ref_id, pc.parent_id, 
      -- check if we found a match or reached top of hierarchy
      CASE WHEN exists
            ( select * 
              from match_table_CM AS mt
              where mt.main_id = pc.ref_id
            ) OR pc.parent_id = pc.ref_id
           THEN 1
           ELSE 0
      END,
      lvl+1
   FROM hierarchy AS h
   JOIN parent_customer AS pc 
     ON pc.ref_id = h.parent_id -- going up in the hierarchy
   WHERE h.match = 0 -- no match yet
     AND lvl < 10 -- just in case there's an endless loop due to bad data
 )
select *
from 
 ( 
   SELECT h.*,
      max(lvl) over (partition by ref_id) as maxlvl
   FROM hierarchy AS h
 ) as dt
WHERE lvl = maxlvl
;

相关文章