为什么 SSIS 在导入 UTF-8 平面文件时不能识别换行符 {LF} 行分隔符?
我正在尝试使用 SSIS 将数据从 utf-8 编码的平面文件导入 SQL Server 2008.这是行数据的末尾在 Notepad++ 中的样子:
我还有几张图片展示了文件连接管理器的样子:
您可以看到数据在文件连接管理器预览中正确显示.当我尝试导入此数据时,没有导入任何行.我收到一条错误消息,指出未找到行分隔符.您可以在文件连接管理器图像中看到标题行分隔符和行分隔符都设置为 {LF}
.这足以生成正确的预览,所以我不知道为什么它无法导入.我尝试了许多结果为零的事情:
- 尝试在 SSMS 中使用向导导入...结果相同
- 尝试使用数据转换,没有影响
- 尝试将行分隔符设置为 (0a),结果相同
[平面文件源 [582]] 警告:到达数据文件的末尾,而读取标题行.确保标题行分隔符和数量要跳过的标题行是正确的.
感谢您看到这里,我非常感谢您能提供的任何帮助.
解决方案原因:
由于列分隔符 Ç
("c" with cedilla)和not
由于行分隔符 {LF}
(换行).
[Read flat file [1]] 警告:到达数据文件的末尾,而读取标题行.确保标题行分隔符和数量要跳过的标题行是正确的.
这是一个示例 SSIS 包,展示了如何使用 Script Component
解决问题,最后还有另一个模拟您的问题的示例.
分辨率:
以下示例包是用 SSIS 2008 R2
编写的.它读取带有行分隔符 {LF}
的平面文件作为单列值;然后使用Script Component
拆分数据,将信息插入到SQL Server 2008 R2
数据库中的表中.
使用 Notepad++ 创建一个只有几行的简单平面文件.下面的示例文件在每一行上都有 Product Id 和 List Price 信息,以 Ç
作为列分隔符,每个行以 {LF}
分隔符结尾.
在 Notepad++ 上,单击 Encoding
,然后单击 Encoding in UTF-8
以保存平面文件UTF-8
编码.
该示例将使用名为 Sora
的 SQL Server 2008 R2
数据库.使用下面给定的脚本创建一个名为 dbo.ProductListPrice
的新表.SSIS 会将平面文件数据插入到该表中.
使用索拉;去创建表 dbo.ProductListPrice(ProductId nvarchar(30) 非空, ListPrice numeric(12,2) NOT NULL);去
使用 Business Intelligence Development Studio (BIDS) 2008 R2 创建 SSIS 包.将包命名为 SO_6268205.dtsx
.创建一个名为 Sora.ds
的数据源,以连接到 SQL Server 2008 R2 中的数据库 Sora
.>
右键单击包内的任意位置,然后单击Variables
以查看变量窗格.在包范围SO_6268205<中创建一个名为
ColumnDelimiter
、数据类型String
的新变量ColumnDelimiter
/code> 并将变量设置为 Ç
右键单击 Connection Managers
并单击 New Flat File Connection...
以创建一个连接读取平面文件.
在平面文件连接管理器编辑器的General
页面上,执行以下操作:
- 将连接管理器名称设置为
ProductListPrice
- 将Description设置为
平面文件连接管理器以读取产品标价信息.
- 选择平面文件路径.我在路径
C:SivaStackOverflowFiles6268205ProductListPrice.txt
中有文件 - 从标题行分隔符中选择
{LF}
- 检查
第一个数据行的列名
- 点击
Columns
页面
在平面文件连接管理器编辑器的Columns
页面上,验证Column delimiter
为空白且已禁用.点击Advanced
页面.
在平面文件连接管理器编辑器的Advanced
页面上,执行以下操作.
- 将名称设置为
LineData
- 验证列分隔符是否设置为
{LF}
- 设置DataType为
Unicode字符串[DT_WSTR]
- 将 OutputColumnWidth 设置为
255
- 点击
预览
页面.
在平面文件连接管理器编辑器的Preview
页面上,验证显示的数据是否正确,然后单击好的
.
您将在 Connection Managers
选项卡上看到数据源 Sora 和平面文件连接管理器 ProductListPrice在包装底部.
将Data Flow Task
拖放到包的Control Flow选项卡上,并将其命名为File to database - without Cedilla delimiter代码>
双击Data Flow Task 将视图切换到包上的Data Flow
选项卡.将 Flat File Source
拖放到 Data Flow 选项卡上.双击Flat File Source 打开Flat File Source Editor
.
在Flat File Source Editor的Connection Manager
页面上,选择Flat File Connection Manager ProductListPrice
并点击 Columns 页面.
在平面文件源编辑器的Columns
页面,勾选LineData
并点击OK
.
将 Script Component
拖放到 Flat File Source 下方的 Data Flow 选项卡上,选择 Transformation
并点击OK
.将绿色箭头从 Flat File Source 连接到 Script Component.双击Script Component打开Script Transformation Editor
.
在 Script Transformation Editor 上单击 Input Columns 并选择 LineData
列.点击输入和输出页面.
在脚本转换编辑器的Inputs and Outputs
页面上,执行以下操作.
- 将输入名称更改为 FlatFileInput
- 将输出名称更改为
SplitDataOutput
- 选择输出列并点击
添加列
.再次重复此操作以添加另一列. - 为第一列命名
ProductId
- 设置ProductId列的DataType为
Unicode字符串[DT_WSTR]
- 将长度设置为
30
在脚本转换编辑器的Inputs and Outputs
页面上,执行以下操作.
- 为第二列命名
ListPrice
- 将ListPrice列的DataType设置为
数字[DT_NUMERIC]
- 将精度设置为
12
- 将Scale设置为
2
- 点击Script页面修改脚本
在脚本转换编辑器的Script
页面上,执行以下操作.
- 点击 ReadOnlyVariables 旁边的省略号按钮并选择变量
User::ColumnDelimiter
- 点击
编辑脚本...
在脚本编辑器中粘贴以下 C#.该脚本执行以下任务.
- 使用变量User::ColumnDelimiter中定义的列分隔符值
Ç
,方法FlatFileInput_ProcessInputRow
拆分传入值并将其分配给脚本组件转换中定义的两个输出列.
在 C# 中编写脚本组件代码
使用系统;使用 System.Data;使用 Microsoft.SqlServer.Dts.Pipeline.Wrapper;使用 Microsoft.SqlServer.Dts.Runtime.Wrapper;[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]公共类 ScriptMain : UserComponent{公共覆盖无效 PreExecute(){base.PreExecute();}公共覆盖无效 PostExecute(){base.PostExecute();}公共覆盖无效 FlatFileInput_ProcessInputRow(FlatFileInputBuffer Row){const int COL_PRODUCT = 0;const int COL_PRICE = 1;字符分隔符 = Convert.ToChar(this.Variables.ColumnDelimiter);string[] lineData = Row.LineData.ToString().Split(delimiter);Row.ProductId = String.IsNullOrEmpty(lineData[COL_PRODUCT])?字符串.空: lineData[COL_PRODUCT];Row.ListPrice = String.IsNullOrEmpty(lineData[COL_PRICE])?0: Convert.ToDecimal(lineData[COL_PRICE]);}}
将 OLE DB Destination
拖放到 Data Flow 选项卡上.将绿色箭头从 Script Component 连接到 OLE DB Destination.双击OLE DB Destination打开OLE DB Destination Editor
.
在 OLE DB Destination Editor 的 Connection Manager
页面上,执行以下操作.
- 从OLE DB 连接管理器 中选择
- 从数据访问模式 选择
- 从表或视图的名称 中选择
- 点击映射页面
Sora
表格或视图 - 快速加载
[dbo].[ProductListPrice]
在OLE DB 目标编辑器上点击Mappings
页面,如果输入和输出列名称相同,将自动映射列.点击OK
.
Data Flow 选项卡在配置完所有组件后应如下所示.
在SQL Server Management Studio (SSMS) 中执行查询select * from dbo.ProductListPrice
以查找表中的行数.在执行包之前它应该是空的.
执行包.您会注意到包成功处理了 9 行.平面文件包含 10 行,但第一行是带有列名的标题.
在SQL Server Management Studio (SSMS)中执行查询select * from dbo.ProductListPrice
,找到9行成功插入桌子.数据应与平面文件数据匹配.
上面的例子说明了如何使用脚本组件手动拆分数据,因为平面文件连接管理器在配置列分隔符时遇到错误Ç
问题模拟:
这个例子展示了一个单独的平面文件连接管理器配置了列分隔符Ç
,它执行但遇到警告并且不处理任何线.
右键单击 Connection Managers
并单击 New Flat File Connection...
以创建一个连接读取平面文件.在平面文件连接管理器编辑器的General
页面上,执行以下操作:
- 将连接管理器名称设置为
ProductListPrice_Cedilla
- 将描述设置为
带有 Cedilla 列分隔符的平面文件连接管理器.
- 我的文件在路径
C:SivaStackOverflowFiles6268205ProductListPrice.txt
选择平面文件路径. - 从标题行分隔符中选择
{LF}
- 检查
第一个数据行的列名
- 点击
Columns
页面
在平面文件连接管理器编辑器的Columns
页面上,执行以下操作:
- 将行分隔符设置为
{LF}
- 列分隔符字段可能被禁用.点击
重置列
- 将列分隔符设置为
Ç
- 点击
高级
页面
在平面文件连接管理器编辑器的Advanced
页面上,执行以下操作:
- 将名称设置为
ProductId
- 将 ColumnDelimiter 设置为
Ç
- 设置DataType为
Unicode字符串[DT_WSTR]
- 将长度设置为
30
- 点击列
ListPrice
在平面文件连接管理器编辑器的Advanced
页面上,执行以下操作:
- 将名称设置为
ListPrice
- 将 ColumnDelimiter 设置为
{LF}
- 将 DataType 设置为
numeric [DT_NUMERIC]
- 将 DataPrecision 设置为
12
- 将 DataScale 设置为
2
- 点击
OK
将数据流任务
拖放到控制流选项卡上,并将其命名为文件到数据库 - 使用 Cedilla 分隔符
.禁用第一个数据流任务.
使用 Flat File Source
和 OLE DB Destination
双击平面文件源打开平面文件源编辑器
.在 Flat File Source Editor 的 Connection Manager
页面上,选择 Flat File Connection Manager ProductListPrice_Cedilla
并点击 Columns 页面来配置列.点击OK
.
执行包.所有组件都将显示绿色,表示该过程成功,但不会处理任何行.可以看到Flat File Source
和OLE DB Destination
点击Progress
标签,你会看到以下警告信息.
[Read flat file [1]] 警告:到达数据文件的末尾,而读取标题行.确保标题行分隔符和数量要跳过的标题行是正确的.
I am trying to import data from a utf-8 encoded flat file into SQL Server 2008 using SSIS. This is what the end of the row data looks like in Notepad++:
I have a couple more images showing what the file connection manager looks like:
You can see that the data shows correctly in the file connection manager preview. When I try to import this data, no rows import. I get an error message indicating that the row delimiter was not found. You can see in the file connection manager images that the header row delimiter and the row delimiter are both set to {LF}
. This was sufficient to generate the correct preview, so I am lost to why it did not work to import. I have tried a number of things that have brought zero results:
- Tried using the Wizard import in SSMS...same results
- Tried using data conversion, no impact
- Tried setting the row delimiter to (0a), same results
[Flat File Source [582]] Warning: The end of the data file was reached while reading header rows. Make sure the header row delimiter and the number of header rows to skip are correct.
Thanks for looking at this and I really appreciate any help you can offer.
解决方案Cause:
SSIS fails to read the file and displays the below warning due to the column delimiter Ç
("c" with cedilla) and not
due to the line delimiter {LF}
(Line Feed).
[Read flat file [1]] Warning: The end of the data file was reached while
reading header rows. Make sure the header row delimiter and the number of
header rows to skip are correct.
Here is a sample SSIS package that shows how to resolve the issue using Script Component
and at the end there is another example that simulates your issue.
Resolution:
Below sample package is written in SSIS 2008 R2
. It reads a flat file with row delimiter {LF}
as a single column value; then splits the data using Script Component
to insert the information into a table in SQL Server 2008 R2
database.
Use Notepad++ to create a simple flat file with few rows. The below sample file has Product Id and List Price information on each row separated by Ç
as column delimiter and each row ends with {LF}
delimiter.
On the Notepad++, click Encoding
and then click Encoding in UTF-8
to save the flat file in UTF-8
encoding.
The sample will use an SQL Server 2008 R2
database named Sora
. Create a new table named dbo.ProductListPrice
using the below given script. SSIS will insert the flat file data into this table.
USE Sora;
GO
CREATE TABLE dbo.ProductListPrice
(
ProductId nvarchar(30) NOT NULL
, ListPrice numeric(12,2) NOT NULL
);
GO
Create an SSIS package using Business Intelligence Development Studio (BIDS) 2008 R2. Name the package as SO_6268205.dtsx
. Create a data source named Sora.ds
to connect to the database Sora
in SQL Server 2008 R2.
Right-click anywhere inside the package and then click Variables
to view the variables pane. Create a new variable named ColumnDelimiter
of data type String
in the package scope SO_6268205
and set the variable with the value Ç
Right-click on the Connection Managers
and click New Flat File Connection...
to create a connection to read the flat file.
On the General
page of the Flat File Connection Manager Editor, perform the following actions:
- Set Connection manager name to
ProductListPrice
- Set Description to
Flat file connection manager to read product list price information.
- Select the flat file path. I have the file in the path
C:SivaStackOverflowFiles6268205ProductListPrice.txt
- Select
{LF}
from Header Row Delimiter - Check
Column names in the first data row
- Click
Columns
page
On the Columns
page of the Flat File Connection Manager Editor, verify that the Column delimiter
is blank and disabled. Click Advanced
page.
On the Advanced
page of the Flat File Connection Manager Editor, perform the following actions.
- Set the Name to
LineData
- Verify that the Column delimiter is set to
{LF}
- Set the DataType to
Unicode string [DT_WSTR]
- Set the OutputColumnWidth to
255
- Click the
Preview
page.
On the Preview
page of the Flat File Connection Manager Editor, verify that the displayed data looks correct and click OK
.
You will see the data source Sora and the flat file connection manager ProductListPrice on the Connection Managers
tab at the bottom of the package.
Drag and drop Data Flow Task
onto the Control Flow tab of the package and name it as File to database - Without Cedilla delimiter
Double-click the Data Flow Task to switch the view to the Data Flow
tab on the package. Drag and drop a Flat File Source
on the Data Flow tab. Double-click the Flat File Source to open Flat File Source Editor
.
On the Connection Manager
page of the Flat File Source Editor, select the Flat File Connection Manager ProductListPrice
and click Columns page.
On the Columns
page of the Flat File Source Editor, check the column LineData
and click OK
.
Drag and drop a Script Component
onto the Data Flow tab below the Flat File Source, select Transformation
and click OK
. Connect the green arrow from Flat File Source to Script Component. Double-click Script Component to open Script Transformation Editor
.
Click Input Columns on Script Transformation Editor and select LineData
column. Click Inputs and Outputs page.
On the Inputs and Outputs
page of the Script Transformation Editor, perform the following actions.
- Change the inputs name to FlatFileInput
- Change the outputs name to
SplitDataOutput
- Select Output Columns and click
Add Column
. Repeat this again to add another column. - Name the first column
ProductId
- Set the DataType of column ProductId to
Unicode string [DT_WSTR]
- Set the Length to
30
On the Inputs and Outputs
page of the Script Transformation Editor, perform the following actions.
- Name the second column
ListPrice
- Set the DataType of column ListPrice to
numeric [DT_NUMERIC]
- Set the Precision to
12
- Set the Scale to
2
- Click Script page to modify the script
On the Script
page of the Script Transformation Editor, perform the following actions.
- Click the ellipsis button against ReadOnlyVariables and select the variable
User::ColumnDelimiter
- Click
Edit Script...
Paste the below C# in the Script Editor. The script performs the following tasks.
- Using the column delimiter value
Ç
defined in the variable User::ColumnDelimiter, the methodFlatFileInput_ProcessInputRow
splits the incoming value and assigns it to the two output columns defined in the Script Component transformation.
Script component code in C#
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
public override void PreExecute()
{
base.PreExecute();
}
public override void PostExecute()
{
base.PostExecute();
}
public override void FlatFileInput_ProcessInputRow(FlatFileInputBuffer Row)
{
const int COL_PRODUCT = 0;
const int COL_PRICE = 1;
char delimiter = Convert.ToChar(this.Variables.ColumnDelimiter);
string[] lineData = Row.LineData.ToString().Split(delimiter);
Row.ProductId = String.IsNullOrEmpty(lineData[COL_PRODUCT])
? String.Empty
: lineData[COL_PRODUCT];
Row.ListPrice = String.IsNullOrEmpty(lineData[COL_PRICE])
? 0
: Convert.ToDecimal(lineData[COL_PRICE]);
}
}
Drag and drop OLE DB Destination
onto the Data Flow tab. Connect the green arrow from Script Component to OLE DB Destination. Double-click OLE DB Destination to open OLE DB Destination Editor
.
On the Connection Manager
page of the OLE DB Destination Editor, perform the following actions.
- Select
Sora
from OLE DB Connection Manager - Select
Table or view - fast load
from Data access mode - Select
[dbo].[ProductListPrice]
from Name of the table or the view - Click Mappings page
Click Mappings
page on the OLE DB Destination Editor would automatically map the columns if the input and output column names are same. Click OK
.
Data Flow tab should look something like this after configuring all the components.
Execute the query select * from dbo.ProductListPrice
in the SQL Server Management Studio (SSMS) to find the number of rows in the table. It should be empty before executing the package.
Execute the package. You will notice that the package successfully processed 9 rows. The flat file contains 10 lines but the first row is header with column names.
Execute the query select * from dbo.ProductListPrice
in the SQL Server Management Studio (SSMS) to find the 9 rows successfully inserted into the table. The data should match with flat file data.
The above example illustrated how to manually split the data using Script Component because the Flat File Connection Manager encounters error when configured the column delimiter Ç
Issue Simulation:
This example shows a separate Flat File Connection Manager configured with column delimiter Ç
, which executes but encounters a warning and does not process any lines.
Right-click on the Connection Managers
and click New Flat File Connection...
to create a connection to read the flat file. On the General
page of the Flat File Connection Manager Editor, perform the following actions:
- Set Connection manager name to
ProductListPrice_Cedilla
- Set Description to
Flat file connection manager with Cedilla column delimiter.
- I have the file in the path
C:SivaStackOverflowFiles6268205ProductListPrice.txt
Select the flat file path. - Select
{LF}
from Header Row Delimiter - Check
Column names in the first data row
- Click
Columns
page
On the Columns
page of the Flat File Connection Manager Editor, perform the following actions:
- Set Row delimiter to
{LF}
- The column delimiter field may be disabled. Click
Reset Columns
- Set Column delimiter to
Ç
- Click
Advanced
page
On the Advanced
page of the Flat File Connection Manager Editor, perform the following actions:
- Set the Name to
ProductId
- Set the ColumnDelimiter to
Ç
- Set the DataType to
Unicode string [DT_WSTR]
- Set the Length to
30
- Click column
ListPrice
On the Advanced
page of the Flat File Connection Manager Editor, perform the following actions:
- Set the Name to
ListPrice
- Set the ColumnDelimiter to
{LF}
- Set the DataType to
numeric [DT_NUMERIC]
- Set the DataPrecision to
12
- Set the DataScale to
2
- Click
OK
Drag and drop a Data Flow task
onto the Control Flow tab and name it as File to database - With Cedilla delimiter
. Disable the first data flow task.
Configure the second data flow task with Flat File Source
and OLE DB Destination
Double-click the Flat File Source to open Flat File Source Editor
. On the Connection Manager
page of the Flat File Source Editor, select the Flat File Connection Manager ProductListPrice_Cedilla
and click Columns page to configure the columns. Click OK
.
Execute the package. All the components will display green color to indicate that the process was success but no rows will be processed. You can see that there are no rows numbers indication between the Flat File Source
and OLE DB Destination
Click the Progress
tab and you will notice the following warning message.
[Read flat file [1]] Warning: The end of the data file was reached while
reading header rows. Make sure the header row delimiter and the number of
header rows to skip are correct.
相关文章