如何从sql中的给定xml中获取'/'分隔的节点/标签名称

2021-10-02 00:00:00 xml xpath sql sql-server

我想从给定的 xml 中获取以 '/' 分隔的节点名称,以便只获取节点/标签名称,而不是从给定的 xml 中获取节点/标签值.

i want to fetch '/' separated node name from a given xml such that only node/tag name are getting fetched instead of node/tag value from a given xml.

假设我有以下 xml :

Suppose if i have below xml :

<ns:manageWorkItemRequest>
    <ns:wiFocus>
        <act:orderDate>2020-03-16T10:30:56.000Z</act:orderDate>
        <act:orderItem>
            <agr:instance>
                <spec1:customerServiceIdentifier>ETHA15302121</spec1:customerServiceIdentifier>
                <spec1:instanceCharacteristic>
                    <spec1:action>
                        <spec1:code>Modify</spec1:code>
                    </spec1:action>
                    <spec1:instanceIdentifier>
                        <spec1:value>OS014-AHEFV5T9</spec1:value>
                </spec1:instanceIdentifier>
             </agr:instance>
        </act:orderItem>
        <act:orderVersion>1</act:orderVersion>
    </ns:wiFocus>
    <ns:wiAction>Create</ns:wiAction>
    <ns:wiVersion>1</ns:wiVersion>
</ns:manageWorkItemRequest>

我想要的结果是:

ns:manageWorkItemRequest/ns:wiFocus/act:orderItem/agr:instance/spec1:customerServiceIdentifier/ETHA15302121

ns:manageWorkItemRequest/ns:wiFocus/act:orderItem/agr:instance/spec1:customerServiceIdentifier/ETHA15302121

实际上的要求是,如果我在上面的 xml 中得到这个ETHA15302121"值,那么我应该显示路径,即该值在 xml 中的确切位置是 '/' 分隔格式.

actually the requirement is if i get this "ETHA15302121" value in above xml then i should show the path i.e. where exactly in xml that value is in '/' separated format.

推荐答案

您的 XML 格式不正确(中间缺少结束标记并且缺少命名空间声明.

Your XML was not well-formed (missing closing tag in the middle and missing namespace declarations.

添加缺失的部分后,它看起来是这样,你可以沿着这条路线尝试一些东西(警告:这不会很快......):

After adding the missing parts it looks as so and you might try something along this route (warning: this won't be fast...):

您的 XML

DECLARE @xml XML=
N'<root xmlns:ns="dummy1" xmlns:act="dummy2" xmlns:agr="dummy3" xmlns:spec1="dummy4">
  <ns:manageWorkItemRequest>
    <ns:wiFocus>
      <act:orderDate>2020-03-16T10:30:56.000Z</act:orderDate>
      <act:orderItem>
        <agr:instance>
          <spec1:customerServiceIdentifier>ETHA15302121</spec1:customerServiceIdentifier>
          <spec1:instanceCharacteristic>
            <spec1:action>
              <spec1:code>Modify</spec1:code>
            </spec1:action>
            <spec1:instanceIdentifier>
              <spec1:value>OS014-AHEFV5T9</spec1:value>
            </spec1:instanceIdentifier>
          </spec1:instanceCharacteristic>
        </agr:instance>
      </act:orderItem>
      <act:orderVersion>1</act:orderVersion>
    </ns:wiFocus>
    <ns:wiAction>Create</ns:wiAction>
    <ns:wiVersion>1</ns:wiVersion>
  </ns:manageWorkItemRequest>
</root>';

--查询

WITH AllNamespaces As
(
    SELECT  CONCAT('ns',ROW_NUMBER() OVER(ORDER BY (B.namespaceUri))) Prefix
           ,B.namespaceUri
    FROM @xml.nodes('//*') A(nd)
    CROSS APPLY(VALUES(A.nd.value('namespace-uri(.)','nvarchar(max)')))B(namespaceUri)
    WHERE LEN(B.namespaceUri)>0
    GROUP BY B.namespaceUri
)
,recCte AS
(
    SELECT 1 AS NestLevel
          ,ROW_NUMBER() OVER(ORDER BY A.nd) AS ElementPosition
          ,CAST(REPLACE(STR(ROW_NUMBER() OVER(ORDER BY A.nd),5),' ','0') AS VARCHAR(900)) COLLATE DATABASE_DEFAULT AS SortString
          ,CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') AS FullName
          ,CAST(CONCAT('/',ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') AS NVARCHAR(MAX)) COLLATE DATABASE_DEFAULT AS XPath
          ,A.nd.value('text()[1]','nvarchar(max)') AS NodeValue
          ,A.nd.query('./*') NextFragment
    FROM @xml.nodes('/*') A(nd)
    LEFT JOIN AllNamespaces ns ON ns.namespaceUri=A.nd.value('namespace-uri(.)','nvarchar(max)') 

    UNION ALL

    SELECT r.NestLevel+1
          ,ROW_NUMBER() OVER(ORDER BY A.nd)  
          ,CAST(CONCAT(r.SortString,REPLACE(STR(ROW_NUMBER() OVER(ORDER BY A.nd),5),' ','0')) AS VARCHAR(900)) COLLATE DATABASE_DEFAULT
          ,CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') AS FullName
          ,CONCAT(r.XPath,'/',ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)'),'[',ROW_NUMBER() OVER(PARTITION BY CONCAT(ns.Prefix+':',A.nd.value('local-name(.)','nvarchar(max)')) ORDER BY A.nd),']') AS FullName
          ,A.nd.value('text()[1]','nvarchar(max)') AS NodeValue
          ,A.nd.query('./*') NextFragment
    FROM recCte r
    CROSS APPLY NextFragment.nodes('*') A(nd)
    OUTER APPLY(SELECT Prefix FROM AllNamespaces ns WHERE ns.namespaceUri=A.nd.value('namespace-uri(.)','nvarchar(max)')) ns
)
SELECT XPath
      ,NodeValue
      ,NestLevel
      ,ElementPosition
      ,SortString
FROM recCte
--WHERE NodeValue IS NOT NULL
ORDER BY SortString;

--结果
/*

--The result
/*

+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| XPath                                                                                                                                                      | NodeValue                | NestLevel | ElementPosition | SortString                               |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| /root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderDate[1]                                                                                      | 2020-03-16T10:30:56.000Z | 4         | 1               | 00001000010000100001                     |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| /root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderItem[1]/ns3:instance[1]/ns4:customerServiceIdentifier[1]                                     | ETHA15302121             | 6         | 1               | 000010000100001000020000100001           |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| /root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderItem[1]/ns3:instance[1]/ns4:instanceCharacteristic[1]/ns4:action[1]/ns4:code[1]              | Modify                   | 8         | 1               | 0000100001000010000200001000020000100001 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| /root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderItem[1]/ns3:instance[1]/ns4:instanceCharacteristic[1]/ns4:instanceIdentifier[1]/ns4:value[1] | OS014-AHEFV5T9           | 8         | 1               | 0000100001000010000200001000020000200001 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| /root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderVersion[1]                                                                                   | 1                        | 4         | 3               | 00001000010000100003                     |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| /root[1]/ns1:manageWorkItemRequest[1]/ns1:wiAction[1]                                                                                                      | Create                   | 3         | 2               | 000010000100002                          |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+
| /root[1]/ns1:manageWorkItemRequest[1]/ns1:wiVersion[1]                                                                                                     | 1                        | 3         | 3               | 000010000100003                          |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+-----------+-----------------+------------------------------------------+

*/

--只是为了表明,创建的 XPath 按预期工作:

--just to show, that the created XPath is working as expected:

WITH XMLNAMESPACES('dummy1' AS ns1,'dummy2' AS ns2,'dummy3' AS ns3,'dummy4' AS ns4,'dummy5' AS ns5)
SELECT @xml.value('/root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderDate[1]','nvarchar(max)')
      ,@xml.value('/root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderItem[1]/ns3:instance[1]/ns4:customerServiceIdentifier[1]','nvarchar(max)')
      ,@xml.value('/root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderItem[1]/ns3:instance[1]/ns4:instanceCharacteristic[1]/ns4:action[1]/ns4:code[1]','nvarchar(max)')
      ,@xml.value('/root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderItem[1]/ns3:instance[1]/ns4:instanceCharacteristic[1]/ns4:instanceIdentifier[1]/ns4:value[1]','nvarchar(max)')
      ,@xml.value('/root[1]/ns1:manageWorkItemRequest[1]/ns1:wiFocus[1]/ns2:orderVersion[1]','nvarchar(max)')
      ,@xml.value('/root[1]/ns1:manageWorkItemRequest[1]/ns1:wiAction[1]','nvarchar(max)')
      ,@xml.value('/root[1]/ns1:manageWorkItemRequest[1]/ns1:wiVersion[1]','nvarchar(max)');

简单的想法:

  • 命名空间前缀可以由您自己定义.底层 URI 很重要.
  • 第一个 cte 将创建一组所有出现的 URI,并将其与前缀一起返回.
  • 递归 CTE 将越来越深入地遍历 XML.只要带有 .nodes()APPLY 可以返回嵌套节点,这就会继续.
  • 全名与完整的 XPath 一样.
  • CAST 和 COLLATE 有助于避免数据类型不匹配(递归 CTE 对此非常挑剔).
  • 需要串联的 SortString 以确保输出中的顺序相同.
  • The namespace prefixes can be defined by your own. The underlying URI is important.
  • The first cte will create a set of all occuring URIs and return this together with a prefix.
  • The recursive CTE will traverse deeper and deeper into the XML. This will continue as long as APPLY with .nodes() can return nested nodes.
  • The full name is concatenated as well as the full XPath.
  • The CASTs and COLLATEs help to avoid data type mismatch (recursive CTEs are very picky with this).
  • The concatenated SortString is needed to ensure the same order in your output.

顺便提一下:有绝对过时的FROM OPENXML,这是-afaik-从字面上恢复一切的唯一方法:

Just to mention it: There is the absolutely outdated FROM OPENXML, which is - afaik - the only way to get literally everything back:

DECLARE @xml XML=
N'<root xmlns="default" xmlns:ns="dummy">
  <a ns:test="blah">blub</a>
  <ns:b test2="hugo">blubber</ns:b>
</root>';

DECLARE @DocHandle INT;
EXEC sp_xml_preparedocument @DocHandle OUTPUT, @xml;
SELECT * FROm OPENXML(@DocHandle,'/*');
EXEC sp_xml_removedocument @DocHandle;

结果

+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| id | parentid | nodetype | localname | prefix | namespaceuri | datatype | prev | text    |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 0  | NULL     | 1        | root      | NULL   | default      | NULL     | NULL | NULL    |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 2  | 0        | 2        | xmlns     | xmlns  | NULL         | NULL     | NULL | NULL    |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 10 | 2        | 3        | #text     | NULL   | NULL         | NULL     | NULL | default |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 3  | 0        | 2        | ns        | xmlns  | NULL         | NULL     | NULL | NULL    |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 11 | 3        | 3        | #text     | NULL   | NULL         | NULL     | NULL | dummy   |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 4  | 0        | 1        | a         | NULL   | default      | NULL     | NULL | NULL    |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 5  | 4        | 2        | test      | ns     | dummy        | NULL     | NULL | NULL    |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 12 | 5        | 3        | #text     | NULL   | NULL         | NULL     | NULL | blah    |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 6  | 4        | 3        | #text     | NULL   | NULL         | NULL     | NULL | blub    |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 7  | 0        | 1        | b         | ns     | dummy        | NULL     | 4    | NULL    |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 8  | 7        | 2        | test2     | NULL   | NULL         | NULL     | NULL | NULL    |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 13 | 8        | 3        | #text     | NULL   | NULL         | NULL     | NULL | hugo    |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+
| 9  | 7        | 3        | #text     | NULL   | NULL         | NULL     | NULL | blubber |
+----+----------+----------+-----------+--------+--------------+----------+------+---------+

如您所见,此结果包含名称空间、前缀和内容.但它很笨拙,离今天"还很远.:-)

As you can see, this result contains namespaces, prefixes and content. But it is very clumsy and far away from "today" :-)

相关文章