使用递归子查询分解进行循环检测

2022-01-14 00:00:00 sql db2 oracle hierarchical-query

自 v2 以来,Oracle SQL 可以使用其专有的 CONNECT BY 语法进行分层查询.在他们最新的 11g 第 2 版中,他们添加了递归子查询分解,也称为递归 with 子句.这是 ANSI 标准,如果我理解正确的话,其他 RDBMS 供应商也已经实现了这一标准.

Oracle SQL can do hierarchical queries since v2, using their proprietary CONNECT BY syntax. In their latest 11g release 2, they added recursive subquery factoring, also known as the recursive with clause. This is the ANSI standard, and if I understand correctly, this one has been implemented by other RDBMS vendors as well.

在比较连接方式和递归方式时,我注意到使用循环检测时结果集有所不同.按结果连接对我来说更直观,所以我想知道 Oracle 的实现是否包含错误,或者这是否是标准 ANSI 和预期行为.因此,我的问题是您是否可以使用 MySQL、DB2、SQL Server 等其他数据库来检查递归查询.前提是那些数据库当然支持递归 with 子句.

When comparing the connect-by with the recursive with, I noticed a difference in the result set when using cycle detection. The connect by results are more intuitive to me, so I'm wondering if Oracle's implementation contains a bug, or if this is standard ANSI and expected behaviour. Therefore my question is if you can check the recursive with query using other databases like MySQL, DB2, SQL Server and others. Provided those databases support the recursive with clause of course.

这是它在 Oracle 11.2.0.1.0 上的工作方式

Here is how it works on Oracle 11.2.0.1.0

SQL> select *
  2    from t
  3  /

        ID  PARENT_ID
---------- ----------
         1          2
         2          1

2 rows selected.

使用 CONNECT BY 语法的查询:

The query using CONNECT BY syntax:

SQL>  select id
  2        , parent_id
  3        , connect_by_iscycle
  4     from t
  5  connect by nocycle parent_id = prior id
  6    start with id = 1
  7  /

        ID  PARENT_ID CONNECT_BY_ISCYCLE
---------- ---------- ------------------
         1          2                  0
         2          1                  1

2 rows selected.

这对我来说很直观.然而,使用新的 ANSI 语法,它又返回一行:

Which looks intuitive to me. However, using the new ANSI syntax it returns one more row:

SQL> with tr (id,parent_id) as
  2  ( select id
  3         , parent_id
  4      from t
  5     where id = 1
  6     union all
  7    select t.id
  8         , t.parent_id
  9      from t
 10           join tr on t.parent_id = tr.id
 11  ) cycle id set is_cycle to '1' default '0'
 12  select id
 13       , parent_id
 14       , is_cycle
 15    from tr
 16  /

        ID  PARENT_ID I
---------- ---------- -
         1          2 0
         2          1 0
         1          2 1

3 rows selected.

这是您可以用来检查的脚本:

This is the script you can use to check:

create table t
( id        number
, parent_id number
);
insert into t values (1, 2);
insert into t values (2, 1);
commit;
with tr (id,parent_id) as
( select id
       , parent_id
    from t
   where id = 1
   union all
  select t.id
       , t.parent_id
    from t
         join tr on t.parent_id = tr.id
) cycle id set is_cycle to '1' default '0'
select id
     , parent_id
     , is_cycle
  from tr;

推荐答案

来自 CONNECT_BY_ISCYCLE:

CONNECT_BY_ISCYCLE 伪列返回 1 如果当前行有一个子元素,它也是它的祖先

The CONNECT_BY_ISCYCLE pseudocolumn returns 1 if the current row has a child which is also its ancestor

以及 CYCLE:

如果一行的祖先行与循环列的值相同,则认为该行形成了一个循环.

A row is considered to form a cycle if one of its ancestor rows has the same values for the cycle columns.

在您的示例中,行 2 确实有一个子代,它也是它的祖先,但它的 id 尚未返回.

In your example, row 2 does have a child which is also its ancestor, but its id has not been returned yet.

换句话说,CONNECT_BY_ISCYCLE 检查 children(尚未返回),而 CYCLE 检查当前行(已经返回).

In other words, CONNECT_BY_ISCYCLE checks the children (which are yet to be returned), while CYCLE checks the current row (which is already returned).

CONNECT BY 是基于行的,而递归 CTE 是基于集合的.

CONNECT BY is row based, while recursive CTE's are set-based.

请注意,Oracle 关于 CYCLE 的文档提到了祖先行".但是,一般来说,递归 CTE 中没有祖先行"的概念.这是一个基于集合的操作,可以完全从树中产生结果.一般来说,锚部分和递归部分甚至可以使用不同的表.

Note that Oracle's documentation on CYCLE mentions an "ancestor row". However, generally speaking, there is no concept of an "ancestor row" in a recursive CTE. It's a set based operation which can yield results completely out of the tree. Generally speaking, the anchor part and the recursive part can even use the different tables.

由于递归CTE通常用于构建层次结构树,Oracle 决定添加循环检查.但是由于递归 CTE 的操作基于集合的方式,通常无法判断下一步是否会生成循环,因为没有明确定义祖先行"循环条件无法也可以定义.

Since recursive CTE's are usually used to build hierarchy trees, Oracle decided to add a cycle check. But due the set-based way the recursive CTE's operate, it's generally impossible to tell will the next step generate a cycle or not, because without a clear definition of the "ancestor row" cycle condition cannot be defined either.

要执行下一个"步骤,整个当前"集需要可用,但要生成当前集的每一行(包括循环列),我们只需要下一个"的结果操作.

To perform the "next" step, the whole "current" set needs to be available, but to generate each row of the current set (which includes the cycle column) we just need to have the results of the "next" operation.

如果当前集合总是由一行组成(如 CONNECT BY 中),这不是问题,但如果在一个集合上定义的递归操作作为一个整体,则有问题.

It's not a problem if the current set always consists of a single row (like in CONNECT BY), but it is a problem if the recursive operation defined on a set as a whole.

还没有研究 Oracle 11,但是 SQL Server 通过隐藏 CONNECT BY 来实现递归 CTE 在它们后面,这需要放置许多限制(所有这些都有效地禁止所有基于集合的操作).

Didn't look into Oracle 11 yet, but SQL Server implements recursive CTE's by just hiding a CONNECT BY behind them, which requires placing numerous restrictions (all of which effectively forbid all set-based operations).

PostgreSQL 的实现是真正基于集合的:您可以在递归部分中对锚部分进行任何操作.但是,它没有任何方法可以检测循环,因为循环一开始就没有定义.

PostgreSQL's implementation, on the other hand, is truly set-based: you can do any operation with the anchor part in the recursive part. It does not have any means to detect cycles, though, because cycles are not defined in the first place.

如前所述,MySQL 根本没有实现 CTE(它没有实现 HASH JOINMERGE JOIN也是如此,只有嵌套循环,所以不要太惊讶).

As was mentioned before, MySQL does not implement CTE's at all (it does not implement HASH JOIN's or MERGE JOINs as well, only the nested loops, so don't be surprised much).

具有讽刺意味的是,我今天收到了一封关于这个主题的信,我将在我的博客中介绍.

Ironically, I received a letter today on this very subject, which I will cover in my blog.

更新:

递归CTE 只不过是变相的CONNECT BY.有关令人震惊的详细信息,请参阅我博客中的这篇文章:

Recursive CTE's in SQL Server are no more than CONNECT BY in disguise. See this article in my blog for shocking details:

  • SQL Server:递归 CTE 真的是基于集合的吗?

相关文章