如何从 XML 中提取值?

2021-09-10 00:00:00 xml tsql xquery sql-server

我有下面定义的 XML 变量及其值.请帮忙

I have the XML variable defined below and its value. Please help

DECLARE @xml2 as XML ;                          
SET @xml2 = '<Student>
  <Marks>
    <Subject>Science</Subject>
    <Score>89</Score>
    <Subject>Maths</Subject>
    <Score>90</Score>
  </Marks>
</Student>'

预期结果应该是:

Subject  Score
-------- ------
Science  89
Maths    90

推荐答案

还有一种方法,应该会快一点...

And one more approach, which should be a little faster...

DECLARE @xml2 as XML ;                          
SET @xml2 = '<Student>
  <Marks>
    <Subject>Science</Subject>
    <Score>89</Score>
    <Subject>Maths</Subject>
    <Score>90</Score>
  </Marks>
</Student>';

WITH tally(Nmbr) AS(SELECT TOP(@xml2.value('count(/Student/Marks/Subject)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
SELECT tally.Nmbr
      ,@xml2.value('(/Student/Marks/Subject[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS [Subject] 
      ,@xml2.value('(/Student/Marks/Score[sql:column("tally.Nmbr")]/text())[1]','int') AS Score 
FROM tally;

简单的想法:

  • 我们通过使用计算的 TOP 子句和 ROW_NUMBER() 对具有较大行数的任何表(我使用 master..spt_values在这里,最好是物理数字表...)
  • 现在我们可以使用 sql:column() 通过位置获取每个值,以将计数的当前值放入 XQuery.
  • 这意味着:我们选择第一个分数第一的科目.比第二个科目的第二个分数等等......
  • We create a tally on the fly by using a computed TOP clause together with ROW_NUMBER() against any table with a larger row count (I use master..spt_values here, best was a physical numbers table...)
  • Now we can grab each value by its position using sql:column() to get the tally's current value into the XQuery.
  • This means: We pick the first Subject with the first Score. Than the second Subject with the second score and so on...

提示:这种格式是非常错误的.如果这在你的控制之下,你真的应该改变它.您完全依赖于元素的顺序和位置.缺失的元素或任何混淆或介于两者之间的其他元素都可能将其拆毁.

Hint: This format is very erronous. If this is under your control you really should change it. You are relying completely on the element's order and position. A missing element or any mix-up or other elements in between could tear this down to the ground.

我会使用类似的东西

<Student>
  <Marks Subject="Science" Score="80"/>
  <Marks Subject="Maths" Score="90"/>
</Student>

<Student>
  <Marks>
    <Subject name="Science">80</Subject>
    <Subject name="Maths">90</Subject>
  </Marks>
</Student>

更新基准

以下将比较具有奇数/偶数结构的 10/100/1000 对的 XML:

The following will compare a XML with 10 / 100 / 1000 pairs in odd/even structure:

--确保使用数据库,其中该表返回至少 1000 行(或使用任何其他表)

--Make sure to use a database, where this table returns at least 1000 rows (or use any other table)

SELECT COUNT(*) FROM master..spt_values

--用虚拟数据填充表格

--Filling a table with dummy data

DECLARE @tbl TABLE(ID INT IDENTITY,[Subject] VARCHAR(30),Score VARCHAR(30));
INSERT INTO @tbl 
SELECT TOP 1000 LEFT(CAST(NEWID() AS varchar(50)),30),CAST(CAST(NEWID() AS binary(4)) AS INT)
FROM master..spt_values;
SELECT * FROM @tbl;

--使用三个不同对数的 XML

--using three XMLs with different count of pairs

DECLARE @xml10 XML;
DECLARE @xml100 XML;
DECLARE @xml1000 XML;

SET @xml10=(
    SELECT TOP 10
           (SELECT [Subject] FOR XML PATH(''),TYPE) AS [*]
          ,(SELECT [Score] FOR XML PATH(''),TYPE) AS [*]
    FROM @tbl t
    ORDER BY t.ID
    FOR XML PATH(''),ROOT('root')
);


SET @xml100=(
    SELECT TOP 100
           (SELECT [Subject] FOR XML PATH(''),TYPE) AS [*]
          ,(SELECT [Score] FOR XML PATH(''),TYPE) AS [*]
    FROM @tbl t
    ORDER BY t.ID
    FOR XML PATH(''),ROOT('root')
);


SET @xml1000=(
    SELECT TOP 1000
           (SELECT [Subject] FOR XML PATH(''),TYPE) AS [*]
          ,(SELECT [Score] FOR XML PATH(''),TYPE) AS [*]
    FROM @tbl t
    ORDER BY t.ID
    FOR XML PATH(''),ROOT('root')
);

--测试 10

DECLARE @d DATETIME2=SYSUTCDATETIME();
WITH tally(Nmbr) AS(SELECT TOP(@xml10.value('count(/root/Subject)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
SELECT tally.Nmbr
      ,@xml10.value('(/root/Subject[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS [Subject] 
      ,@xml10.value('(/root/Score[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS Score 
INTO #t10a
FROM tally;
SELECT 'xml10 a',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();
SELECT c.value('(./text())[1]', 'nvarchar(max)') AS [Subject]
    , c.value('(/root/*[sql:column("w.r")]/text())[1]', 'nvarchar(max)') AS [Score]
INTO #t10b
FROM @xml10.nodes('/root/*[position() mod 2 = 1]') AS t(c)
    CROSS APPLY (SELECT t.c.value('let $n := . return count(/root/*[. << $n[1]]) + 2','INT') AS r
         ) AS w;
SELECT 'xml10 b',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

--测试 100

SET @d =SYSUTCDATETIME();
WITH tally(Nmbr) AS(SELECT TOP(@xml100.value('count(/root/Subject)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
SELECT tally.Nmbr
      ,@xml100.value('(/root/Subject[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS [Subject] 
      ,@xml100.value('(/root/Score[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS Score 
INTO #t100a
FROM tally;
SELECT 'xml100 a',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();
SELECT c.value('(./text())[1]', 'nvarchar(max)') AS [Subject]
    , c.value('(/root/*[sql:column("w.r")]/text())[1]', 'nvarchar(max)') AS [Score]
INTO #t100b
FROM @xml100.nodes('/root/*[position() mod 2 = 1]') AS t(c)
    CROSS APPLY (SELECT t.c.value('let $n := . return count(/root/*[. << $n[1]]) + 2','INT') AS r
         ) AS w;
SELECT 'xml100 b',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

--测试 1000

SET @d =SYSUTCDATETIME();
WITH tally(Nmbr) AS(SELECT TOP(@xml1000.value('count(/root/Subject)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
SELECT tally.Nmbr
      ,@xml1000.value('(/root/Subject[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS [Subject] 
      ,@xml1000.value('(/root/Score[sql:column("tally.Nmbr")]/text())[1]','nvarchar(max)') AS Score 
INTO #t1000a
FROM tally;
SELECT 'xml1000 a',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();
SELECT c.value('(./text())[1]', 'nvarchar(max)') AS [Subject]
    , c.value('(/root/*[sql:column("w.r")]/text())[1]', 'nvarchar(max)') AS [Score]
INTO #t1000b
FROM @xml1000.nodes('/root/*[position() mod 2 = 1]') AS t(c)
    CROSS APPLY (SELECT t.c.value('let $n := . return count(/root/*[. << $n[1]]) + 2','INT') AS r
         ) AS w;
SELECT 'xml1000 b',DATEDIFF(MILLISECOND,@d,SYSUTCDATETIME());

方法 a 是我使用计数的方法,方法 b 是 Yitzhak 使用 XQuery 的方法.

Method a is my approach using a tally, method b is Yitzhak's approach using XQuery.

这两种方法的区别很小

  10 Elements a=7ms     / b=6ms
 100 Elements a=83ms    / b=79ms
1000 Elements a=8942ms  / b=8721ms

一些一般差异:

  • 计数方法也适用于每个系列的三元组或更多元素.
  • Tally 方法仍然适用于中间的其他元素
  • XQuery 方法可以更好地处理意外丢失的元素,但如果只丢失了一个预期元素,两种方法都不会正确返回.

相关文章