防止递归 CTE 多次访问节点

考虑以下简单的 DAG:

Consider the following simple DAG:

  1->2->3->4

还有一个表,#bar,描述了这一点(我使用的是 SQL Server 2005):

And a table, #bar, describing this (I'm using SQL Server 2005):

parent_id   child_id
1           2
2           3
3           4
//... other edges, not connected to the subgraph above

现在假设我有一些其他任意标准来选择第一条边和最后一条边,即 1->2 和 3->4.我想用这些来找到我图表的其余部分.

Now imagine that I have some other arbitrary criteria that select the first and last edges, i.e. 1->2 and 3->4. I want to use these to find the rest of my graph.

我可以按如下方式编写递归 CTE(我使用的术语来自 MSDN):

I can write a recursive CTE as follows (I'm using terminology from MSDN):

with foo(parent_id,child_id) as (
// anchor member that happens to select first and last edges:
select parent_id,child_id from #bar where parent_id in (1,3)
union all
// recursive member:
select #bar.* from #bar
join foo on #bar.parent_id = foo.child_id
)
select parent_id,child_id from foo

然而,这会导致边缘 3->4 被选择两次:

However, this results in edge 3->4 being selected twice:

parent_id  child_id
1          2
3          4
2          3
3          4    // 2nd appearance!

如何防止查询递归到已经描述过的子图中?如果在查询的递归成员"部分中,我可以引用到目前为止由递归 CTE 检索到的所有数据(并在递归成员中提供一个谓词指示,排除已经访问过的节点).但是,我认为我只能访问由递归成员的最后一次迭代返回的数据.

How can I prevent the query from recursing into subgraphs that have already been described? I could achieve this if, in my "recursive member" part of the query, I could reference all data that has been retrieved by the recursive CTE so far (and supply a predicate indicating in the recursive member excluding nodes already visited). However, I think I can access data that was returned by the last iteration of the recursive member only.

当有很多这样的重复时,这不会很好地扩展.有没有办法防止这种不必要的额外递归?

This will not scale well when there is a lot of such repetition. Is there a way of preventing this unnecessary additional recursion?

请注意,我可以在语句的最后一行使用select distinct"来达到预期的结果,但这似乎是在完成所有(重复)递归之后应用的,所以我不要认为这是一个理想的解决方案.

Note that I could use "select distinct" in the last line of my statement to achieve the desired results, but this seems to be applied after all the (repeated) recursion is done, so I don't think this is an ideal solution.

编辑 - hainstech 建议通过添加谓词来停止递归,以排除显式在起始集中的向下递归路径,即仅递归 where foo.child_id not in (1,3).这仅适用于上述情况,因为它很简单——所有重复的部分都在锚节点集内开始.它不能解决它们可能不是的一般情况.例如,考虑将边 1->4 和 4->5 添加到上述集合中.即使使用建议的谓词,边缘 4->5 也将被捕获两次.:(

Edit - hainstech suggests stopping recursion by adding a predicate to exclude recursing down paths that were explicitly in the starting set, i.e. recurse only where foo.child_id not in (1,3). That works for the case above only because it simple - all the repeated sections begin within the anchor set of nodes. It doesn't solve the general case where they may not be. e.g., consider adding edges 1->4 and 4->5 to the above set. Edge 4->5 will be captured twice, even with the suggested predicate. :(

推荐答案

CTE 是递归的.

当您的 CTE 有多个初始条件时,这意味着它们也有不同的递归堆栈,并且无法在另一个堆栈中使用一个堆栈中的信息.

When your CTE's have multiple initial conditions, that means they also have different recursion stacks, and there is no way to use information from one stack in another stack.

在您的示例中,递归堆栈将如下所示:

In your example, the recursion stacks will go as follows:

(1) - first IN condition
(1, 2)
(1, 2, 3)
(1, 2, 3, 4)
(1, 2, 3) - no more children
(1, 2) - no more children
(1) - no more children, going to second IN condition

(3) - second condition
(3, 4)
(3) - no more children, returning

如您所见,这些递归堆栈不相交.

As you can see, these recursion stack do not intersect.

您可能会在临时表中记录访问过的值,JOIN 每个值与临时表,如果找到就不要跟随这个值,但是 SQL Server 会不支持这些东西.

You could probably record the visited values in a temporary table, JOIN each value with the temptable and do not follow this value it if it's found, but SQL Server does not support these things.

所以你只需使用 SELECT DISTINCT.

相关文章