在 SSIS 2012 的 OLE DB 源中使用动态 SQL

2021-12-30 00:00:00 sql sql-server ssis dynamic-sql

我有一个存储过程作为 SQL 命令文本,它传递了一个包含表名的参数.proc 然后从该表返回数据.我不能直接调用该表作为 OLE DB 源,因为某些业务逻辑需要发生在 proc 中的结果集上.在 SQL 2008 中,这运行良好.在升级的 2012 包中,我得到无法确定元数据,因为 ... 包含动态 SQL.考虑使用 WITH RESULT SETS 子句来显式描述结果集."

问题是我无法在 proc 中定义字段名称,因为作为参数传递的表名称可以是不同的值,并且每次生成的字段都可能不同.有人遇到过这个问题或有任何想法吗?我已经使用dm_exec_describe_first_result_set"、临时表和包含 WITH RESULT SETS 的 CTE 使用动态 SQL 尝试了各种事情,但它在 SSIS 2012 中不起作用,同样的错误.上下文是许多动态 SQL 方法的一个问题.

这是我尝试过的最新方法,但没有运气:

DECLARE @sql VARCHAR(MAX)SET @sql = 'SELECT * FROM ' + @dataTableName声明@listStr VARCHAR(MAX)SELECT @listStr = COALESCE(@listStr +',','') + [name] + '' + system_type_name FROM sys.dm_exec_describe_first_result_set(@sql, NULL, 1)exec('exec(''SELECT * FROM myDataTable'') WITH RESULT SETS ((' + @listStr + '))')

解决方案

所以我出于善意问一下,为什么在上帝的绿色地球上您使用 SSIS 数据流任务来处理这样的动态源数据?

您遇到麻烦的原因是因为您歪曲了 SSIS 数据流任务的每一个目的:

  • 使用可以在设计时静态类型化和缓存的已知元数据提取已知源
  • 通过简单的(理想情况下是异步的)转换来运行一个已知的过程
  • 获取转换后的数据并将其加载到已知目的地以及已知元数据

拥有带回不同数据的参数化数据源很好.但是让他们每次都带回完全不同的元数据,而不同的集合之间没有一致性,坦率地说,是荒谬的,我不完全确定我想知道你是如何处理 2008 工作包中的所有列元数据的.

这就是它希望您向 SSIS 查询添加 WITH RESULTS SET 的原因 - 以便它可以生成一些元数据.它不会在运行时执行此操作 - 它不能!它必须有一组已知的列(因为它无论如何都将它们全部别名为编译变量)才能使用.每次运行该数据流任务时,它都需要相同的列 - 完全相同的列,包括名称、类型和约束.

这导致了一个(可怕的、可怕的)解决方案 - 只需将所有数据粘贴到一个带有 Column1、Column2 ... ColumnN 的临时表中,然后使用您用作表名参数的相同变量来有条件地分支您的编写代码并对列执行任何您想要的操作.

另一种更明智的解决方案是为每个源表创建一个数据流任务,并在优先约束中使用您的参数来选择应该运行哪个数据流任务.

对于这种不适合开箱即用 ETL 的解决方案,您还应该高度考虑仅在 C# 或脚本任务中滚动自己的解决方案,而不是 SSIS 提供的数据流任务.

简而言之,请不要这样做.想想孩子(包)!

I have a stored proc as the SQL command text, which is getting passed a parameter that contains a table name. The proc then returns data from that table. I cannot call the table directly as the OLE DB source because some business logic needs to happen to the result set in the proc. In SQL 2008 this worked fine. In an upgraded 2012 package I get "The metadata could not be determined because ... contains dynamic SQL. Consider using the WITH RESULT SETS clause to explicitly describe the result set."

The problem is I cannot define the field names in the proc because the table name that gets passed as a parameter can be a different value and the resulting fields can be different every time. Anybody encounter this problem or have any ideas? I've tried all sorts of things with dynamic SQL using "dm_exec_describe_first_result_set", temp tables and CTEs that contains WITH RESULT SETS, but it doesn't work in SSIS 2012, same error. Context is a problem with a lot of the dynamic SQL approaches.

This is latest thing I tried, with no luck:

DECLARE @sql VARCHAR(MAX)
SET @sql = 'SELECT * FROM ' + @dataTableName

DECLARE @listStr VARCHAR(MAX)
SELECT @listStr = COALESCE(@listStr +',','') + [name] + ' ' + system_type_name FROM sys.dm_exec_describe_first_result_set(@sql, NULL, 1)

exec('exec(''SELECT * FROM myDataTable'') WITH RESULT SETS ((' + @listStr + '))')

解决方案

So I ask out of kindness, by why on God's green earth are you using an SSIS Data Flow task to handle dynamic source data like this?

The reason you're running into trouble is because you're perverting every purpose of an SSIS Data flow task:

  • to extract a known source with known metadata that can be statically typed and cached in design-time
  • to run through a known process with straightforward (and ideally asynchronous) transformations
  • to take that transformed data and load it into a known destination also with known metadata

It's fine to have parameterized data sources that bring back different data. But to have them bring back entirely different metadata each time with no congruity between the different sets is, frankly, ridiculous, and I'm not entirely sure I want to know how you handled all your column metadata in the working 2008 package.

This is why it wants you add a WITH RESULTS SET to the SSIS query - so it can generate some metadata. It doesn't do this at runtime - it can't! It has to have a known set of columns (because it aliases them all into compiled variables anyway) to work with. It expects the same columns every time it runs that Data Flow Task - the exact same columns, down to the names, the types, and the constraints.

Which leads to one (terrible, terrible) solution - just stick all the data into a temporary table with Column1, Column2 ... ColumnN and then use the same variable you're using as the table name parameter to conditionally branch your code and do whatever you want with the columns.

Another more sane solution would be to create a data flow task for each of your source tables, and use your parameter in a precedence constraint to just pick which data flow task should run.

For a solution this poorly tailored for an out-of-the-box ETL, you should also highly consider just rolling your own in C# or a script task instead of the Data Flow Task provided by SSIS.

In short, please don't do this. Think of the children (packages)!

相关文章