将具有相同标记的 XML 值分成不同的行 SQL Server

2021-10-02 00:00:00 xml sql xquery sql-server xquery-sql

我有一个要解析的 XML 文件.XML 是使用

I have an XML File which I am trying to parse. The XML was created through Excel using

另存为 XML

因为 XML 文件是从 Microsoft Excel 创建的,所以它有这个标题:

Because the XML file was created from Microsoft Excel, it has this header:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">

我试图提取的数据是这样设置的:

The data I am trying to extract is set up in blocks like this:

<Row ss:AutoFitHeight="0" ss:Height="30">
    <Cell ss:StyleID="s22"/>
    <Cell ss:StyleID="s24"><Data ss:Type="String">Jane Doe</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">JaneDoe</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">XYZ</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">(555) 555-5555</Data></Cell>
    <Cell ss:StyleID="s22"/>
   </Row>

现在,我的查询如下所示:

Right now, my query looks like this:

;WITH XMLNAMESPACES ('urn:schemas-microsoft-com:office:spreadsheet' as ss)

select * from (
select X.value('local-name(.)[1]','varchar(max)') as Name,
X.value('.[1]','varchar(max)') as Value
from @allUsers.nodes('//*') as T(X)
    ) a
where Name  = 'Data'

并给我这些结果:

Name    Value
----    -----------
Data    Jane Doe
Data    JaneDoe
Data    XYZ
Data    (555)555-5555

我想做的是将它分成 4 行,所以我有类似的东西:

What I would like to do is separate this into 4 rows, so I have something like:

Name      UserName    Address    Phone
-----     ----------  ---------  ----------
Jane Doe  JaneDoe     XYZ        (555)-555-5555

我尝试选择一列作为

X.value('.[2]','varchar(max)') as UserName

但我只是得到了所有的 NULL 值.

but I just get all NULL values for that.

有没有办法做到这一点?

Is there any way to do this?

XML 文件的一般结构如下:

The general structure of the XML file looks like:

<Workbook>
  <DocumentProperties>
  </DocumentProperties>
  <ExcelWorkbook>
  </ExcelWorkbook>
  <Styles>
    <Style>
    </Style>
  </Styles>
  <Worksheet>
    <Table>
      <Column.../>
      <Column.../>
      <Column.../>
      <Row>
        <Cell.../>
        <Cell><Data>...</Data></Cell>
        <Cell><Data>...</Data></Cell>
        <Cell><Data>...</Data></Cell>
        <Cell><Data>...</Data></Cell>
        <Cell.../>
      </Row>
      ...
    </Table>
  </Worksheet>

我想要获取的信息在 ...</Data> 字段

and the information I am trying to get is in the <Data>...</Data> field

编辑

从我表述这个问题的方式来看,标题名称似乎已经被编入,但它们实际上被读取为 <;/Cell>.我也不确定这部分的用途是什么

From the way I phrased the question, it would seem like the header names are already programmed in, but they are actually read as rows in <Cell><Data><Data/></Cell>. I am also not sure what purpose the part serves

这是部分的开始:

<Table ss:ExpandedColumnCount="6" ss:ExpandedRowCount="2685" x:FullColumns="1"
   x:FullRows="1">
   <Column ss:AutoFitWidth="0" ss:Width="26.25"/>
   <Column ss:AutoFitWidth="0" ss:Width="117" ss:Span="3"/>
   <Column ss:Index="6" ss:AutoFitWidth="0" ss:Width="29.25"/>
   <Row ss:AutoFitHeight="0" ss:Height="60"> --Contains the header names
    <Cell ss:StyleID="s22"/>
    <Cell ss:StyleID="s23"><Data ss:Type="String">Name</Data></Cell>
    <Cell ss:StyleID="s23"><Data ss:Type="String">UserName</Data></Cell>
    <Cell ss:StyleID="s23"><Data ss:Type="String">Address</Data></Cell>
    <Cell ss:StyleID="s23"><Data ss:Type="String">Telephone Number</Data></Cell>
    <Cell ss:StyleID="s22"/>
   </Row>

   <Row ss:AutoFitHeight="0" ss:Height="30"> --First record I would like to extract
    <Cell ss:StyleID="s22"/>
    <Cell ss:StyleID="s24"><Data ss:Type="String">John Smith</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">JSmith</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">ABC</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">(999) 999-9999</Data></Cell>
    <Cell ss:StyleID="s22"/>
   </Row>

推荐答案

同一用户提出了两个非常相似的问题.OP 决定删除一个并在此处合并,并要求我将我的答案从那里复制到此线程.

There were two very similar question by the same user. The OP decided to delete one and combine this here and asked me to copy my answer from there to this thread.

注意必须声明为DEFAULT"的 xmlns-namespace:

Be aware of the xmlns-namespace which must be declared as "DEFAULT":

简化了您的 XML,但这个想法应该没问题...

Simplified your XML, but the idea should be OK...

DECLARE @allUsers XML=
'<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <Worksheet>
 <Table>
   <Row ss:AutoFitHeight="0" ss:Height="30">
    <Cell ss:StyleID="s22"/>
    <Cell ss:StyleID="s24"><Data ss:Type="String">Jane Doe</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">JaneDoe</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">XYZ</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">(555) 555-5555</Data></Cell>
    <Cell ss:StyleID="s22"/>
   </Row>
   </Table>
 </Worksheet>   
</Workbook>';

;WITH XMLNAMESPACES ('urn:schemas-microsoft-com:office:spreadsheet' as ss
                     ,DEFAULT 'urn:schemas-microsoft-com:office:spreadsheet')
SELECT T.X.value('Cell[1]/Data[1]','varchar(max)') AS DontKnow1
      ,T.X.value('Cell[2]/Data[1]','varchar(max)') AS Name
      ,T.X.value('Cell[3]/Data[1]','varchar(max)') AS UserName
      ,T.X.value('Cell[4]/Data[1]','varchar(max)') AS DontKnow2
      ,T.X.value('Cell[5]/Data[1]','varchar(max)') AS Telephone
      ,T.X.value('Cell[6]/Data[1]','varchar(max)') AS DontKnow3
FROM @allUsers.nodes('/Workbook/Worksheet/Table/Row') as T(X)

相关文章