根据来自另一个数据库的查询结果查询一个数据库

2021-12-30 00:00:00 sql etl sql-server ssis ssis-2012

我在 VS 2013 中使用 SSIS.我需要从 1 个数据库中获取 ID 列表,并使用该 ID 列表查询另一个数据库,即 SELECT ... from MySecondDB WHERE ID IN ({list of IDs from MyFirstDB}).

I am using SSIS in VS 2013. I need to get a list of IDs from 1 database, and with that list of IDs, I want to query another database, ie SELECT ... from MySecondDB WHERE ID IN ({list of IDs from MyFirstDB}).

推荐答案

有 3 种方法可以实现:

There is 3 Methods to achieve this:

首先你必须添加一个 Lookup Transformation 就像@TheEsisia 回答的那样,但还有更多的要求:

First you have to add a Lookup Transformation like @TheEsisia answered but there are more requirements:

  • 在查找中,您必须编写包含 ID 列表的查询(例如:SELECT ID From MyFirstDB WHERE ...)

至少你必须从查找表中选择一列

At least you have to select one column from the lookup table

要过滤行 WHERE ID IN ({list of IDs from MyFirstDB}) 你必须在查找错误输出中做一些工作 Error case 有两种方法:

To filter rows WHERE ID IN ({list of IDs from MyFirstDB}) you have to do some work in the look up error output Error case there are 2 ways:

  1. 将错误处理设置为 Ignore Row 以便添加的列(来自查找)值将为 null ,因此您必须添加一个 Conditional split 来过滤具有相等值的行空值.
  1. set Error handling to Ignore Row so the added columns (from lookup) values will be null , so you have to add a Conditional split that filter rows having values equal NULL.

假设您选择了 col1 作为查找列,因此您必须使用类似的表达式

Assuming that you have chosen col1 as lookup column so you have to use a similar expression

ISNULL([col1]) == False

  1. 或者你可以将Error处理设置为Redirect Row,这样所有的行都会被发送到错误输出行,这个行可能不被使用,所以数据会被过滤
  1. Or you can set Error handling to Redirect Row, so all rows will be sent to the error output row, which may not be used, so data will be filtered

这种方法的缺点是在执行过程中所有数据都被加载和过滤.

The disadvantage of this method is that all data is loaded and filtered during execution.

此外,如果在加载所有数据后在本地机器上进行网络过滤(服务器上的第二种方法)是内存.

Also if working on network filtering is done on local machine (2nd method on server) after all data is loaded is memory.

为了避免加载所有数据,您可以采取一种解决方法,您可以使用脚本任务来实现:(答案写在 VB.NET 中)

To avoid loading all data, you can do a workaround, You can achieve this using a Script Task: (answer writen in VB.NET)

假设连接管理器名称是 TestAdo 并且 "Select [ID] FROM dbo.MyTable" 是获取 id 列表的查询,并且User::MyVariableList 是你想要存储 id 列表的变量

Assuming that the connection manager name is TestAdo and "Select [ID] FROM dbo.MyTable" is the query to get the list of id's , and User::MyVariableList is the variable you want to store the list of id's

注意:此代码将从连接管理器读取连接

    Public Sub Main()

        Dim lst As New Collections.Generic.List(Of String)


        Dim myADONETConnection As SqlClient.SqlConnection  
    myADONETConnection = _  
        DirectCast(Dts.Connections("TestAdo").AcquireConnection(Dts.Transaction), _  
        SqlClient.SqlConnection)

        If myADONETConnection.State = ConnectionState.Closed Then
        myADONETConnection.Open()
        End If

        Dim myADONETCommand As New SqlClient.SqlCommand("Select [ID] FROM dbo.MyTable", myADONETConnection)

        Dim dr As SqlClient.SqlDataReader

        dr = myADONETCommand.ExecuteReader

        While dr.Read

            lst.Add(dr(0).ToString)

        End While


        Dts.Variables.Item("User::MyVariableList").Value = "SELECT ... FROM ... WHERE ID IN(" &  String.Join(",", lst) & ")"

        Dts.TaskResult = ScriptResults.Success
    End Sub

并且 User::MyVariableList 应该用作源 (变量中的 Sql 命令)

类似于第二种方法,但是这将使用 Execute SQL Task 构建 IN 子句,然后使用整个查询作为 OLEDB Source,

Similar to the second method but this will build the IN clause using an Execute SQL Task then using the whole query as OLEDB Source,

  1. 只需在 DataFlow 任务之前添加一个执行 SQL 任务
  2. ResultSet 属性设置为 single
  3. 选择 User::MyVariableList 作为结果集
  4. 使用以下 SQL 命令

  1. Just add an Execute SQL Task before the DataFlow Task
  2. Set ResultSet property to single
  3. Select User::MyVariableList as Result Set
  4. Use the following SQL command

DECLARE @str AS VARCHAR(4000)

SET @str = ''

SELECT @str = @str + CAST([ID] AS VARCHAR(255)) + ','
FROM dbo.MyTable 

SET @str = 'SELECT * FROM  MySecondDB WHERE ID IN (' + SUBSTRING(@str,1,LEN(@str) - 1) + ')'

SELECT @str

如果列是字符串数据类型,你应该在值前后添加引号,如下所示:

SELECT @str = @str + '''' + CAST([ID] AS VARCHAR(255)) + ''','
    FROM dbo.MyTable

确保您已将 DataFlow Task Delay Validation 属性设置为 True

Make sure that you have set the DataFlow Task Delay Validation property to True

相关文章