如何使用 scriptdom API 提取跨数据库引用

微软公开了 scriptdom API 来解析和生成 TSQL.我是新手,还在玩它.我想知道如何从这样的查询中获取跨数据库引用.

Microsoft has exposed the scriptdom API to parse and generate TSQL. I'm new to it and still playing with it. I want to know how to get the cross databases references from queries like this one.

UPDATE  t3
SET     description = 'abc'
FROM    database1.dbo.table1 t1
        INNER JOIN database2.dbo.table2 t2
            ON (t1.id = t2.t1_id)
        LEFT OUTER JOIN database3.dbo.table3 t3
            ON (t3.id = t2.t3_id)
        INNER JOIN database2.dbo.table4 t4
            ON (t4.id = t2.t4_id)

我想要的是参考列表:

database1.dbo.table1.id = database2.dbo.table2.t1_id
database3.dbo.table3.id = database2.dbo.table2.t3_id
database2.dbo.table4.id = database2.dbo.table2.t4_id

但是,对于最后一个条目database2.dbo.table4.id = database2.dbo.table2.t4_id,两端的两列都来自同一个数据库database2,这不是我想要的.所以我最终需要的结果是:

However, for the last entry database2.dbo.table4.id = database2.dbo.table2.t4_id, both of the columns from the 2 ends are from the same database database2, this is not what I want. So my final required result is:

database1.dbo.table1.id = database2.dbo.table2.t1_id
database3.dbo.table3.id = database2.dbo.table2.t3_id

可以用scriptdom实现吗?

推荐答案

一个健壮的实现并不容易.对于本题提出的有限问题,解决方法比较简单——强调相对".我假设如下:

A robust implementation is not easy. For the limited problem as posed in this question, the solution is relatively simple -- stress "relatively". I assume the following:

  • 查询只有一个级别 - 没有 UNION、子查询、WITH 表达式或其他为别名引入新范围的东西(这会很快变得复杂).
  • 查询中的所有标识符都是完全限定的,因此毫无疑问它指的是什么对象.

解决策略是这样的:我们首先访问 TSqlFragment 来制作所有表别名的列表,然后再次访问它来获取所有等值连接,一路扩展别名.使用该列表,我们确定不引用同一数据库的等值连接列表.在代码中:

The solution strategy looks like this: we first visit the TSqlFragment to make a list of all table aliases, then visit it again to get all equijoins, expanding aliases along the way. Using that list, we determine the list of equijoins that do not refer to the same database. In code:

var sql = @"
  UPDATE  t3
  SET     description = 'abc'
  FROM    database1.dbo.table1 t1
      INNER JOIN database2.dbo.table2 t2
        ON (t1.id = t2.t1_id)
      LEFT OUTER JOIN database3.dbo.table3 t3
        ON (t3.id = t2.t3_id)
      INNER JOIN database2.dbo.table4 t4
        ON (t4.id = t2.t4_id)

";                

var parser = new TSql120Parser(initialQuotedIdentifiers: false);
IList<ParseError> errors;
TSqlScript script;
using (var reader = new StringReader(sql)) {
  script = (TSqlScript) parser.Parse(reader, out errors);
}
// First resolve aliases.
var aliasResolutionVisitor = new AliasResolutionVisitor();
script.Accept(aliasResolutionVisitor);

// Then find all equijoins, expanding aliases along the way.
var findEqualityJoinVisitor = new FindEqualityJoinVisitor(
  aliasResolutionVisitor.Aliases
);
script.Accept(findEqualityJoinVisitor);

// Now list all aliases where the left database is not the same
// as the right database.
foreach (
  var equiJoin in 
  findEqualityJoinVisitor.EqualityJoins.Where(
    j => !j.JoinsSameDatabase()
  )
) {
  Console.WriteLine(equiJoin.ToString());
}

输出:

database3.dbo.table3.id = database2.dbo.table2.t3_id
database1.dbo.table1.id = database2.dbo.table2.t1_id

AliasResolutionVisitor 是一个简单的东西:

public class AliasResolutionVisitor : TSqlFragmentVisitor {
  readonly Dictionary<string, string> aliases = new Dictionary<string, string>();
  public Dictionary<string, string> Aliases { get { return aliases; } }

  public override void Visit(NamedTableReference namedTableReference ) {
    Identifier alias = namedTableReference.Alias;
    string baseObjectName = namedTableReference.SchemaObject.AsObjectName();
    if (alias != null) {
      aliases.Add(alias.Value, baseObjectName);
    }
  }
}

我们只需遍历查询中的所有命名表引用,如果它们有别名,则将其添加到字典中.请注意,如果引入子查询,这将非常失败,因为此访问者没有范围的概念(实际上,为访问者添加范围要困难得多,因为 TSqlFragment 无法注释解析树或甚至从一个节点走它).

We simply go through all the named table references in the query and, if they have an alias, add this to a dictionary. Note that this will fail miserably if subqueries are introduced, because this visitor has no notion of scope (and indeed, adding scope to a visitor is much harder because the TSqlFragment offers no way to annotate the parse tree or even walk it from a node).

EqualityJoinVisitor 更有趣:

public class FindEqualityJoinVisitor : TSqlFragmentVisitor {
  readonly Dictionary<string, string> aliases;
  public FindEqualityJoinVisitor(Dictionary<string, string> aliases) {
    this.aliases = aliases;
  }

  readonly List<EqualityJoin> equalityJoins = new List<EqualityJoin>();
  public List<EqualityJoin> EqualityJoins { get { return equalityJoins; } }

  public override void Visit(QualifiedJoin qualifiedJoin) {
    var findEqualityComparisonVisitor = new FindEqualityComparisonVisitor();
    qualifiedJoin.SearchCondition.Accept(findEqualityComparisonVisitor);
    foreach (
      var equalityComparison in findEqualityComparisonVisitor.Comparisons
    ) {
      var firstColumnReferenceExpression = 
        equalityComparison.FirstExpression as ColumnReferenceExpression
      ;
      var secondColumnReferenceExpression = 
        equalityComparison.SecondExpression as ColumnReferenceExpression
      ;
      if (
        firstColumnReferenceExpression != null && 
        secondColumnReferenceExpression != null
      ) {
        string firstColumnResolved = resolveMultipartIdentifier(
          firstColumnReferenceExpression.MultiPartIdentifier
        );
        string secondColumnResolved = resolveMultipartIdentifier(
          secondColumnReferenceExpression.MultiPartIdentifier
        );
        equalityJoins.Add(
          new EqualityJoin(firstColumnResolved, secondColumnResolved)
        );
      }
    }
  }

  private string resolveMultipartIdentifier(MultiPartIdentifier identifier) {
    if (
      identifier.Identifiers.Count == 2 && 
      aliases.ContainsKey(identifier.Identifiers[0].Value)
    ) {
      return 
        aliases[identifier.Identifiers[0].Value] + "." + 
        identifier.Identifiers[1].Value;
    } else {
      return identifier.AsObjectName();
    }
  }
}

这会寻找 QualifiedJoin 实例,如果我们找到它们,我们会依次检查搜索条件以查找所有出现的相等比较.请注意,这确实适用于嵌套搜索条件:在 Bar JOIN Foo ON Bar.Quux = Foo.Quux AND Bar.Baz = Foo.Baz 中,我们将找到这两个表达式.

This hunts for QualifiedJoin instances and, if we find them, we in turn examine the search condition to find all occurrences of equality comparisons. Note that this does work with nested search conditions: in Bar JOIN Foo ON Bar.Quux = Foo.Quux AND Bar.Baz = Foo.Baz, we will find both expressions.

我们如何找到它们?使用另一个小访客:

How do we find them? Using another small visitor:

public class FindEqualityComparisonVisitor : TSqlFragmentVisitor {
  List<BooleanComparisonExpression> comparisons = 
    new List<BooleanComparisonExpression>()
  ;
  public List<BooleanComparisonExpression> Comparisons { 
    get { return comparisons; } 
  }

  public override void Visit(BooleanComparisonExpression e) {
    if (e.IsEqualityComparison()) comparisons.Add(e);
  }
}

这里没有什么复杂的.将此代码折叠到其他访问者中并不难,但我认为这更清楚.

Nothing complicated here. It wouldn't be hard to fold this code into the other visitor, but I think this is clearer.

就是这样,除了一些我将不加评论地呈现的帮助代码:

That's it, except for some helper code which I'll present without comment:

public class EqualityJoin {
  readonly SchemaObjectName left;
  public SchemaObjectName Left { get { return left; } }

  readonly SchemaObjectName right;
  public SchemaObjectName Right { get { return right; } }

  public EqualityJoin(
    string qualifiedObjectNameLeft, string qualifiedObjectNameRight
  ) {
    var parser = new TSql120Parser(initialQuotedIdentifiers: false);
    IList<ParseError> errors;
    using (var reader = new StringReader(qualifiedObjectNameLeft)) {
      left = parser.ParseSchemaObjectName(reader, out errors);
    }
    using (var reader = new StringReader(qualifiedObjectNameRight)) {
      right = parser.ParseSchemaObjectName(reader, out errors);
    }
  }

  public bool JoinsSameDatabase() {
    return left.Identifiers[0].Value == right.Identifiers[0].Value;
  }

  public override string ToString() {
    return String.Format("{0} = {1}", left.AsObjectName(), right.AsObjectName());
  }
}

public static class MultiPartIdentifierExtensions {
  public static string AsObjectName(this MultiPartIdentifier multiPartIdentifier) {
    return string.Join(".", multiPartIdentifier.Identifiers.Select(i => i.Value));
  }
}

public static class ExpressionExtensions {
  public static bool IsEqualityComparison(this BooleanExpression expression) {
    return 
      expression is BooleanComparisonExpression && 
      ((BooleanComparisonExpression) expression).ComparisonType == BooleanComparisonType.Equals
    ;
  }
}

正如我之前提到的,这段代码非常脆弱.它假定查询具有特定的形式,如果不这样做,它可能会失败(非常糟糕,会给出误导性的结果).一个主要的开放挑战是扩展它,以便它可以正确处理范围和非限定引用,以及 T-SQL 脚本可能具有的其他怪异特性,但我认为它仍然是一个有用的起点.

As I mentioned before, this code is quite brittle. It assumes queries have a particular form, and it could fail (quite badly, by giving misleading results) if they don't. A major open challenge would be to extend it so it can handle scopes and unqualified references correctly, as well as the other weirdness a T-SQL script can feature, but I think it's a useful starting point nevertheless.

相关文章