从 CSV 文件填充 SQL 数据库

2021-12-30 00:00:00 csv etl sql-server ssis ssis-2012

我需要使用带有 SSIS 的 CSV 文件创建一个数据库.CSV 文件包括四列:

I need to create a database using a CSV file with SSIS. The CSV file includes four columns:

我需要使用那个表的信息来填充我下面用SQL创建的三个表.

I need to use the information of that table to populate the three tables I created in SQL below.

我意识到我需要使用员工表的一列 EmployeeNumber 和组表 GroupID 来填充 EmployeeGroup 表.为此，我认为我需要一个Join Merge表，但我在SSIS中创建了数据流任务，结果相同，没有显示数据.

I have realized that what I need is to use one column of the Employee Table, EmployeeNumber, and Group Table, GroupID, to populate the EmployeeGroup table. For that, I thought that a Join Merge table is what I needed, but I created the Data Flow Task in SSIS, and the results are the same, no data displayed.

中间的表是用来关联其他表的.

The middle table is the one used to relate the other tables.

我在 SSIS 中创建了包，并且填充了 Employee 和 Group 表，但没有填充 EmployeeGroup 表.EmployeeGroup 将只显示没有数据的 EmployeeNumber 和 Group ID 列.

I created the package in SSIS and the Employee and Group Tables are populated, but the EmployeeGroup table is not. EmployeeGroup will only show the EmployeeNumber and Group ID columns with no data.

我是 SSIS 的新手，我真的不知道还能做什么.我将非常感谢您的帮助.

I am new using SSIS, and I really do not know what else to do. I will really appreciate your help.

推荐答案

概述

使用 SSIS 的解决方案
使用 3 个数据流任务
使用 2 个数据流任务

使用 Microsoft.Ace.OLEDB
使用 Microsoft 文本驱动程序

第一个解决方案 - SSIS
使用 3 个数据流任务
这可以只使用 2 个数据流任务来完成，但是根据问题中提到的 OP 我是使用 SSIS 的新手，我真的不知道还能做什么，我将提供最简单的解决方案，即 3 DataFlow Task 以避免使用更多组件，例如 MultiCast.

1st Solution - SSIS

Using 3 Data Flow Tasks

This can be done using only 2 Data Flow Task, but according to what the OP mentioned in the question I am new using SSIS, and I really do not know what else to do, i will provide easiest solution which is 3 DataFlow Task to avoid using more components like MultiCast.

因为你想建立一个关系数据库并从csv中提取关系，你必须读取csv 3次-将其视为3个单独的文件-.

Because you want to build a relational database and extract relations from the csv, you have to read the csv 3 times -consider it as 3 seperated files -.

首先要导入Employees和Groups数据，然后导入它们之间的关系表.

First you have to import Employees and Groups Data, Then you have to import the relation table between them.

每个导入步骤都可以在单独的数据流任务中完成

Each Import step can be done in a seperate Data Flow Task

添加平面文件连接管理器(Csv 文件)
添加OLEDB 连接管理器(SQL 目标)
添加 3 个 DataFlow 任务，如下图所示

Add a Flat File connection Manager (Csv File)

Add An OLEDB connection Manager (SQL Destination)

Add 3 DataFlow Task like the image below

添加一个平面文件源、一个脚本组件、OLEDB目标，如下图所示

Add a Flat File Source , a Script Component , OLEDB destination like shown in the image below

在 Script Component 中选择 Group Name 列作为 Input

In the Script Component choose Group Name column as Input

选择输出缓冲区并将SynchronousInputID Property更改为None并添加类型为<的输出列OutGroupname代码>DT_STR

Select the Output Buffer and change SynchronousInputID Property to None And add an output column OutGroupname with type DT_STR

在 Script 部分写入以下代码:

Imports System.Collections.Generic Private m_List As New List(Of String) Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) If Not Row.GroupName_IsNull AndAlso Not String.IsNullOrEmpty(Row.GroupName.Trim) Then If Not m_List.Contains(Row.GroupName.Trim) Then m_List.Add(Row.GroupName.Trim) CreateOutputRows(Row.GroupName.Trim) End If End If End Sub Public Sub CreateOutputRows(ByVal strValue As String) Output0Buffer.AddRow() Output0Buffer.OutGroupName = strValue End Sub

在 OLEDB Destination 映射 OutGroupName 到 GroupName Column

On the OLEDB Destination map OutGroupName to GroupName Column

重复对 Groupname Column 完成的相同步骤:唯一的区别是您必须选择 EmployeeID、Employee Name、LoginName 列作为 Script Component 中的输入，并在比较中使用 ID 列代替 Groupname 列

Repeat the same steps done with Groupname Column : with a single difference that is you have to choose the EmployeeID, Employee Name, LoginName columns as Input in the Script Component and Use the ID Column instead of Groupname column in the comparaison

你必须添加一个平面文件源、查找转换、OLEDB目标

在LookUp Transformation组件中选择Groups表作为查找表

In The LookUp Transformation Component select Groups Table as a Lookup table

映射 GroupName 列并获取 Group ID 作为输出

Map GroupName Columns and Get Group ID as output

在错误输出配置

在Oledb Destination映射列如下

注意:GroupID必须是Identity(在sql server中设置)

Note: GroupID must be an Identity (set it in sql server)

您必须执行与 3 个数据流任务解决方案相同的步骤，但不是将 2 个数据流任务添加到 Group 和 Employee，只需添加一个数据流任务，在 Flat File Source 之后添加一个 MultiCast 组件来复制 Flow.然后，对于第一个流，使用 Employee 数据流任务中使用的相同 Script Component 和 OLEDB Destination，对于第二个流，使用 Script Component 和 OLEDB Destinationcode>Script Component 和 OLEDB Destination 与 Group 相关.

You have to do the same steps as the 3 Data Flow Tasks solution, but instead of adding 2 Data Flow Tasks to Group and Employee, just add one Data Flow Task, and after the Flat File Source add a MultiCast component to duplicate the Flow. Then for the first flow use the same Script Component and OLEDB Destination used in the Employee Data Flow Task, and for the second flow use the Script Component and OLEDB Destination related to Group.

通过 T-SQL 命令将平面文件导入 SQL 的方法有很多

There are many method to import Flat file to SQL via T-SQL commands

假设安装的 Microsoft ACE OLEDB 版本是 Microsoft.ACE.OLEDB.12.0 并且 csv 文件位置是 C:abc.csv

Assuming that the installed version of Microsoft ACE OLEDB is Microsoft.ACE.OLEDB.12.0 and that the csv file location is C:abc.csv

首先将数据导入员工和组表

INSERT INTO [GROUP] ([Group Name]) SELECT [Group Name] FROM OPENROWSET ( 'Microsoft.ACE.OLEDB.12.0','Text;Database=C:;IMEX=1;','SELECT * FROM abc.csv' ) t INSERT INTO [Employee] ([Employee Number],[Employee Name],[LoginName]) SELECT [Employee Number],[Employee Name],[LoginName] FROM OPENROWSET ( 'Microsoft.ACE.OLEDB.12.0','Text;Database=C:;IMEX=1;','SELECT * FROM abc.csv' ) t

导入 Employee_Group 数据

INSERT INTO [EmployeeGroup] ([Employee Number],[GroupID]) SELECT t1.[Employee Number],t2.[GroupID] FROM OPENROWSET ( 'Microsoft.ACE.OLEDB.12.0','Text;Database=C:;IMEX=1;','SELECT * FROM abc.csv' ) t1 INNER JOIN GROUP t2 ON t1.[Group Name] = T2.[Group Name]

带有 Microsoft 文本驱动程序的 OPENROWSET

首先将数据导入员工和组表

INSERT INTO [GROUP] ([Group Name]) SELECT [Group Name] FROM OPENROWSET ( 'MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=C:;', 'SELECT * FROM abc.csv' ) t INSERT INTO [Employee] ([Employee Number],[Employee Name],[LoginName]) SELECT [Employee Number],[Employee Name],[LoginName] FROM OPENROWSET ( 'MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=C:;', 'SELECT * FROM abc.csv' ) t

导入 Employee_Group 数据

INSERT INTO [EmployeeGroup] ([Employee Number],[GroupID]) SELECT t1.[Employee Number],t2.[GroupID] FROM OPENROWSET ( 'MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=C:;', 'SELECT * FROM abc.csv' ) t1 INNER JOIN GROUP t2 ON t1.[Group Name] = T2.[Group Name]

注意:您可以将数据导入临时表，然后查询该表，以避免多次连接到 csv 文件

导入 csv 文件到 SQL Server 的方法有很多，您可以查看以下链接以获取更多信息.

There are many method to import csv files to SQL server, you can check the following links for additional informations.

使用 PowerShell 将 CSV 文件导入 SQL Server 的四种简单方法
如何从 .csv 导入数据在 SQL Server 中使用 PowerShell?

OPENROWSET (Transact-SQL)
T-SQL – 使用 OpenRowSet 读取 CSV 文件
使用 Openrowset 导入错误

相关文章