SSIS:更新或插入后删除行

2021-12-30 00:00:00 etl sql-server ssis

有以下情况:我有一个 StudentsA 表,它需要与另一个表同步,位于不同的服务器 StudentsB.这是从 A 到 B 的单向同步.由于表 StudentsA 可以容纳大量行,因此我们有一个名为 StudentsSync(在输入服务器上)的表,其中包含自上次以来已修改的 StudentsA 的 ID从StudentsA复制到StudentsB.

Here is the following situation: I have a table of StudentsA which needs to be synchronized with another table, on a different server, StudentsB. It's a one-way sync from A to B. Since the table StudentsA can hold a large number of rows, we have a table called StudentsSync (on the input server) containing the ID of StudentsA which have been modified since the last copy from StudentsA to StudentsB.

我做了以下 SSIS 数据流任务:

I made the following SSIS Data Flow task:

唯一的问题是我需要在成功复制或更新后从 StudentsSync 中删除该行.像这样:

The only problem is that I need to DELETE the row from StudentsSync after a successful copy or update. Something like this:

知道如何实现这一点吗?

Any idea how this can be achieved?

推荐答案

可以用3种方法实现

1.如果OutputDB中的target表有TimeStamp列,例如CreatemodifiedTimeStamp 然后可以通过编写一个简单的查询获得 updatedinserted 的行.您需要在 Control Flowexecte sql task 中编写以下查询以delete Sync 表中的那些行.

1.If your target table in OutputDB has TimeStamp columns such as Create and modified TimeStamp then rows which have got updated or inserted can be obtained by writing a simple query. You need to write the below query in the execte sql task in Control Flow to delete those rows in Sync Table .

Delete from SyncTable
where keyColumn in (Select primary_key from target 
where ModifiedTimeStamp >= GETDATE() or (ModifiedTimeStamp is null
and CreateTimeStamp>=GETDATE()))

我假设 StudentsA 的 primary keyTarget 的 primary key 一起出现在 Sync 表中 表.上述条件基本上检查,如果 new row added 然后 CreateTimeStamp 列将具有 current 日期和 modifiedTimeStamp 将为 null 否则,如果值是 updated,则 modifiedTimeStamp 将具有当前日期

I assume StudentsA's primary key is present in Sync table along with primary key of Target table. The above condition basically checks, if a new row is added then CreateTimeStamp column will have current date and modifiedTimeStamp will be null else if the values are updated then the modifiedTimeStamp will have current date

如果您的 target 表中有 TimeStamp 列,如果您将数据加载到 Data Warehouse 表中,我觉得应该在那里,则上述查询将起作用>

The above query will work if you have TimeStamp columns in your target table which i feel should be there if your loading data into Data Warehouse

2.您可以使用MERGE语法在Control Flow中使用Execute SQL Task来执行更新和插入.无需使用Data Flow Task . 即使您没有任何 TimeStamp 列

2.You can use MERGE syntax to perform the update and insert in Control Flow with Execute SQL Task.No need to use Data Flow Task .The below query can be used even if you don't have any TimeStamp columns

DECLARE @Output TABLE ( ActionType VARCHAR(20), SourcePrimaryKey INT)

MERGE StudentsB  AS TARGET
USING StudentsA  AS SOURCE 
ON (TARGET.CommonColumn = SOURCE.CommonColumn) 

WHEN MATCHED 
THEN 
UPDATE SET TARGET.column = SOURCE.Column,TARGET.ModifiedTimeStamp=GETDATE()

WHEN NOT MATCHED BY TARGET THEN 
INSERT (col1,col2,Col3) 
VALUES (SOURCE.col1, SOURCE.col2, SOURCE.Col3)

OUTPUT $action, 
INSERTED.PrimaryKey AS SourcePrimaryKey INTO @Output

Delete from SyncTable
where PrimaryKey in (Select SourcePrimaryKey from @Output
                     where ActionType ='INSERT' or ActionType='UPDATE')

代码没有经过测试,因为我的时间不多了.但至少它应该让你知道如何继续..有关 MERGE 语法的更多详细信息,请阅读 this 和 这个

The code is not tested as i'm running out of time .but at-least it should give you a fair idea how to proceed . .For furthur detail on MERGE syntax read this and this

3.使用Multicast组件duplicateInsertUpdate的数据集.连接一个多播lookmatch输出和另一个多播到Lookup No match output

3.Use Multicast Component to duplicate the dataset for Insert and Update .Connect a MULTICAST to lookmatch output and another multicast to Lookup No match output

相关文章