如何在 SSIS 中动态映射输入和输出列?

2021-12-30 00:00:00 sql etl sql-server ssis ssis-2012

我必须通过 SSIS 从 .dbf 文件上传 SQL Server 中的数据.我的输出列是固定的,但输入列不是固定的,因为文件来自客户端,客户端可能已经按照自己的风格更新了数据.也可能有一些未使用的列,或者输入列名可能与输出列不同.

I Have to Upload Data in SQL Server from .dbf Files through SSIS. My Output Column is fixed but the input column is not fixed because the files come from client and client may have updated data by his own style. there may be some unused column too or input column name can be different from output column.

我想到的一个想法是将文件输入列与 SQL 数据库表中的输出列进行映射,并仅使用行中存在的那些列作为文件 ID.

One idea I had in my mind was to map files input column with output column in SQL Database table and use only those column which is present in the row for file id.

但是我不知道该怎么做.你能建议我做同样的事情,或者你有什么想法吗?

But I am not getting how to do that. can you suggest me for doing the same or else you have any idea?

表格示例.

+--------+--------------+--------------+--------+|文件 ID |输入列 |输出列 |活跃 |+--------+---------------+--------------+--------+|1 |客户CD |客户代码 |1 |+--------+---------------+--------------+--------+|1 |名称 |客户名称 |1 |+--------+---------------+--------------+--------+|1 |地址 |客户添加 |1 |+--------+---------------+--------------+--------+|2 |客户代码 |客户代码 |1 |+--------+---------------+--------------+--------+|2 |客户名称 |客户姓名 |1 |+--------+---------------+--------------+--------+|2 |位置 |客户添加 |1 |+--------+---------------+--------------+--------+

推荐答案

如果你创建了一个类似的表,你可以通过 2 种方法使用它在 SSIS 包内动态映射列,或者你必须以编程方式构建整个包.在这个答案中,我将尝试为您提供一些有关如何做到这一点的见解.

注意:此方法仅适用于所有 .dbf 文件的列数相同但名称不同的情况

在这种方法中,您将根据您创建的 FileID 和映射表生成将用作源的 SQL 命令.您必须知道 FileID 和 .dbf 文件路径存储在变量中.例如:

In this approach you will generate the SQL command that will be used as source based on the FileID and the Mapping table you created. You must know is the FileID and the .dbf File Path stored inside a Variable. as example:

假设表名是inputoutputMapping

使用以下命令添加执行 SQL 任务:

Add an Execute SQL Task with the following command:

DECLARE @strQuery as VARCHAR(4000)

SET @strQuery = 'SELECT '

SELECT @strQuery = @strQuery + '[' + InputColumn + '] as [' + OutputColumn + '],'
FROM inputoutputMapping
WHERE FileID = ?

SET @strQuery = SUBSTRING(@strQuery,1,LEN(@strQuery) - 1) + ' FROM ' + CAST(? as Varchar(500))

SELECT @strQuery

然后在参数映射选项卡中选择包含要映射到参数 0 的 FileID 的变量和包含 .dbf 文件名(替代表名)到参数 <代码>1

And in the Parameter Mapping Tab select the variable that contains the FileID to be Mapped to the parameter 0 and the variable that contains the .dbf file name (alternative to table name) to the parameter 1

将 ResultSet 类型设置为 Single Row 并将 ResultSet 0 存储在字符串类型的变量中,例如 @[User::SourceQuery]

Set the ResultSet type to Single Row and store the ResultSet 0 inside a variable of type string as example @[User::SourceQuery]

ResultSet 值如下:

The ResultSet value will be as following:

SELECT [CustCd] as [CustCode],[CNAME] as [CustName],[Address] as [CustAdd] FROM database1

OLEDB Source中选择Table Access Mode to SQL Command from Variable并使用@[User::SourceQuery]变量作为源.

In the OLEDB Source select the Table Access Mode to SQL Command from Variable and use @[User::SourceQuery] variable as source.

在这种方法中,您必须在数据流任务中使用脚本组件作为源:

In this approach you have to use a Script Component as Source inside the Data Flow Task:

首先,如果您不想对它们进行硬编码,则需要通过变量将 .dbf 文件路径和 SQL Server 连接传递给脚本组件.

在脚本编辑器中,您必须为在目标表中找到的每一列添加一个输出列并将它们映射到目标.

Inside the script editor, you must add an output column for each column found in the destination table and map them to the destination.

在脚本内部,您必须将 .dbf 文件读入数据表:

Inside the Script, you must read the .dbf file into a datatable:

  • C# 从 .DBF 文件读取到数据表
  • 将 DBF 加载到数据表中

在将数据加载到数据表后,还要用您在 SQL Server 中创建的 MappingTable 中找到的数据填充另一个数据表.

After loading the data into a datatable, also fill another datatable with the data found in the MappingTable you created in SQL Server.

在循环数据表列之后,将 .ColumnName 更改为相关的输出列,例如:

After that loop over the datatable columns and change the .ColumnName to the relevant output column, as example:

foreach (DataColumn col in myTable.Columns)
    {

    col.ColumnName = MappingTable.AsEnumerable().Where(x => x.FileID = 1 && x.InputColumn = col.ColumnName).Select(y => y.OutputColumn).First(); 

    }

After 遍历数据表中的每一行并创建一个脚本输出行.

After loop over each row in the datatable and create a script output row.

另外注意,在分配输出行时,必须检查列是否存在,可以先将所有列名加入字符串列表中,然后再用它来检查,例如:

In addition, note that in while assigning output rows, you must check if the column exists, you can first add all columns names to list of string, then use it to check, as example:

var columnNames = myTable.Columns.Cast<DataColumn>()
                             .Select(x => x.ColumnName)
                             .ToList();  


foreach (DataColumn row in myTable.Rows){

if(columnNames.contains("CustCode"){

    OutputBuffer0.CustCode = row("CustCode");

}else{

    OutputBuffer0.CustCode_IsNull = True

}

//continue checking all other columns

}

如果您需要有关使用脚本组件作为源的更多详细信息,请查看以下链接之一:

If you need more details about using a Script Component as a source, then check one of the following links:

  • SSIS 脚本组件作为源
  • 使用脚本组件创建源
  • 脚本组件作为源 - SSIS
  • SSIS –使用脚本组件作为源

我认为除了您可以选择动态构建包之外,没有其他方法可以用来实现此目标,那么您应该使用:

I don't think there are other methods that you can use to achieve this goal except you has the choice to build the package dynamically, then you should go with:

  • BIML
  • 集成服务管理对象模型
  • EzApi 库

最近我在 Git-Hub 上开始了一个新项目,这是一个使用 C# 开发的类库.您可以使用它使用架构映射方法将表格数据从 excel、word、powerpoint、text、csv、html、json 和 xml 导入到具有不同架构定义的 SQL 服务器表中.在以下位置查看:

Recently i started a new project on Git-Hub, which is a class library developed using C#. You can use it to import tabular data from excel, word , powerpoint, text, csv, html, json and xml into SQL server table with a different schema definition using schema mapping approach. check it out at:

  • SchemaMapper:C# Schema 映射类库

您可以按照此 Wiki 页面获取分步指南:

You can follow this Wiki page for a step-by-step guide:

  • 将数据从多个文件导入到一个 SQL 表中的分步指南

相关文章