从 Sqlite 到 SQL Server 读取大量数据在预执行时失败
我有一个巨大的 (26GB) sqlite 数据库,我想用 SSIS 将它导入到 SQL Server.
我已经正确设置了所有内容.部分数据流工作正常并正在导入数据.
数据流很简单.它们仅由源和目标组成.
但是当涉及到一个有 8000 万行的表时,数据流会失败并显示以下无用消息:
<块引用>代码:0xC0047062
来源:数据流任务源 9 - nibrs_bias_motivation [55]
说明:System.Data.Odbc.OdbcException (0x80131937): ERROR [HY000] 未知错误 (7)
在 System.Data.Odbc.OdbcConnection.HandleError(OdbcHandle hrHandle, RetCode retcode)
在 System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior 行为,String 方法,Boolean needReader,Object[] methodArguments,SQL_API odbcApiMethod)
在 System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior 行为,String 方法,Boolean needReader)
在 System.Data.Odbc.OdbcCommand.ExecuteReader(CommandBehavior 行为)
在 System.Data.Odbc.OdbcCommand.ExecuteDbDataReader(CommandBehavior 行为)
在 System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader(CommandBehavior 行为)
在 Microsoft.SqlServer.Dts.Pipeline.DataReaderSourceAdapter.PreExecute()
在 Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponentWrapper100 包装器)
而在此任务失败之前,内存使用率上升到 99%,然后任务失败.这让我觉得这是一个内存问题.但我不知道我该如何解决这个问题.
我尝试在所有数据流任务上将 DelayValidation
设置为 true.没有改变.我玩过缓冲区大小.没什么.
我能做什么?
解决方案分步指南
由于从大数据集读取时抛出错误,尝试分块读取数据,可以按照以下步骤实现:
- 声明 2 个
Int32
类型的变量(@[User::RowCount]
和@[User::IncrementValue]
) - 添加一个
Execute SQL Task
来执行一个select Count(*)
命令并将结果集存储到变量@[User::RowCount]
- 添加具有以下首选项的 For 循环:
- 在for循环容器内添加一个
数据流任务
- 在数据流任务中添加
ODBC Source
和OLEDB Destination
- 在 ODBC Source 中选择
SQL Command
选项并编写一个SELECT * FROM TABLE
查询 *(仅检索元数据` - 映射源和目标之间的列
- 返回
Control flow
并点击Data flow task
并点击F4查看属性窗口 在属性窗口中,转到表达式并将以下表达式分配给
[ODBC Source].[SQLCommand]
属性:(有关更多信息,请参阅参考文献
- ODBC 源 - SQL Server
- 如何在 ODBC SQLCommand 表达式中传递 SSIS 变量?
- 如何使用 SSIS ODBC 源以及 OLE DB 和 ODBC 之间的区别?
- SQLite 限制
I have a huge (26GB) sqlite database that I want to import to SQL Server with SSIS.
I have everything setup correctly. Some of the data flows are working correctly and importing the data.
Data flows are simple. They just consist of source and destination.
But when it comes to a table that has 80 million rows, data flow fails with this unhelpful message:
Code: 0xC0047062
Source: Data Flow Task Source 9 - nibrs_bias_motivation [55]
Description: System.Data.Odbc.OdbcException (0x80131937): ERROR [HY000] unknown error (7)at System.Data.Odbc.OdbcConnection.HandleError(OdbcHandle hrHandle, RetCode retcode)
at System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, String method, Boolean needReader, Object[] methodArguments, SQL_API odbcApiMethod)
at System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, String method, Boolean needReader)
at System.Data.Odbc.OdbcCommand.ExecuteReader(CommandBehavior behavior)
at System.Data.Odbc.OdbcCommand.ExecuteDbDataReader(CommandBehavior behavior)
at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader(CommandBehavior behavior)
at Microsoft.SqlServer.Dts.Pipeline.DataReaderSourceAdapter.PreExecute()
at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponentWrapper100 wrapper)And before this task fails, memory usage goes up to 99%, then the task fails. This made me think its a memory issue. But I don't know how can I solve this.
I tried setting
DelayValidation
to true on all data flow tasks. Nothing changed. I played with the buffer sizes. Nothing.What can I do?
解决方案Step by Step guide
Since the error is thrown when reading from a large dataset, try reading data by chunks, to achieve that you can follow these steps:
- Declare 2 Variables of type
Int32
(@[User::RowCount]
and@[User::IncrementValue]
) - Add an
Execute SQL Task
that execute aselect Count(*)
command and store the Result Set into the variable@[User::RowCount]
- Add a For Loop with the following preferences:
- Inside the for loop container add a
Data flow task
- Inside the dataflow task add an
ODBC Source
andOLEDB Destination
- In the ODBC Source select
SQL Command
option and write aSELECT * FROM TABLE
query *(to retrieve metadata only` - Map the columns between source and destination
- Go back to the
Control flow
and click on theData flow task
and hit F4 to view the properties window In the properties window go to expression and Assign the following expression to
[ODBC Source].[SQLCommand]
property: (for more info refer to How to pass SSIS variables in ODBC SQLCommand expression?)"SELECT * FROM MYTABLE ORDER BY ID_COLUMN LIMIT 500000 OFFSET " + (DT_WSTR,50)@[User::IncrementValue]"
Where
MYTABLE
is the source table name, andIDCOLUMN
is your primary key or identity column.Control Flow Screenshot
References
- ODBC Source - SQL Server
- How to pass SSIS variables in ODBC SQLCommand expression?
- HOW TO USE SSIS ODBC SOURCE AND DIFFERENCE BETWEEN OLE DB AND ODBC?
- SQLite Limit
相关文章