如何使用 SSIS 从平面文件中删除重复的行?

首先让我说,能够从一个平面文件中获取 1700 万条记录,推送到远程机器上的数据库并且需要 7 分钟,这真是太棒了.SSIS 真的很棒.但是现在我已经有了这些数据,我该如何删除重复项?

Let me first say that being able to take 17 million records from a flat file, pushing to a DB on a remote box and having it take 7 minutes is amazing. SSIS truly is fantastic. But now that I have that data up there, how do I remove duplicates?

更好的是,我想获取平面文件,从平面文件中删除重复项并将它们放回另一个平面文件中.

Better yet, I want to take the flat file, remove the duplicates from the flat file and put them back into another flat file.

我在考虑:

数据流任务

  • 文件源(带有关联的文件连接)
  • for 循环容器
  • 一个脚本容器,包含一些判断另一行是否存在的逻辑

谢谢,这个网站上的每个人都非常博学.

Thak you, and everyone on this site is incredibly knowledgeable.

更新: 我找到了这个链接,可能有助于回答这个问题

推荐答案

使用排序组件.

只需选择您希望对加载的行进行排序的字段,然后在左下角您会看到一个用于删除重复项的复选框.此框仅根据排序条件删除任何重复的行因此,在下面的示例中,如果我们仅对第一个字段进行排序,则行将被视为重复:

Simply choose which fields you wish to sort your loaded rows by and in the bottom left corner you'll see a check box to remove duplicates. This box removes any rows which are duplicates based on the sort criteria only so in the example below the rows would be considered duplicate if we only sorted on the first field:

1 | sample A |
1 | sample B |

相关文章