如何从 XMLType 节点中提取元素路径?

2021-10-01 00:00:00 xml xpath sql oracle xmltype

我想要一个关于 XML 文档的选择语句,并且一列应该返回每个节点的路径.

I would like to have a select statement on an XML document and one column should return me the path of each node.

例如,给定数据

SELECT * 
FROM TABLE(XMLSequence(
  XMLTYPE('<?xml version="1.0"?>
    <users><user><name>user1</name></user>
           <user><name>user2</name></user>
           <group>
              <user><name>user3</name></user>
           </group>
           <user><name>user4</name></user>
    </users>').extract('/*//*[text()]'))) t;

结果

column_value
--------
<user><name>user1</name></user>
<user><name>user2</name></user>
<user><name>user3</name></user>
<user><name>user4</name></user>

我想要这样的结果:

path                     value
------------------------ --------------
/users/user/name         user1
/users/user/name         user2
/users/group/user/name   user3
/users/user/name         user4

我不知道如何做到这一点.我认为有两件事必须正确协同工作:

I can not see how to get to this. I figure there are two thing that have to work together properly:

  • 我可以使用单个操作或方法从 XMLType 中提取 path 吗,或者我必须使用 string-magic 来执行此操作吗??
  • 什么是正确的 XPath 表达式,以便我获得整个元素路径(如果可能的话),例如.<users><group><user><name>user3</name></user></group></user><;用户>user3?
  • Can I extract the path from an XMLType with a single operation or method, or do I have to do this with string-magic?
  • What is the correct XPath expression so that I do get the whole element path (if thats possible), eg. <users><group><user><name>user3</name></user></group></user> insead of <user><name>user3</name></user>?

也许我还没有完全理解 XMLType.可能是我需要不同的方法,但我看不到.

Maybe I am not understanding XMLType fully, yet. It could be I need a different approach, but I can not see it.

旁注:

  • 在最终版本中,XML 文档将来自表的 CLOB,而不是静态文档.
  • path 列当然也可以使用点或其他任何东西,并且最初的斜杠不是问题,任何表示都可以.
  • 此外,我不介意每个内部节点是否也获得一个结果行(可能将 null 作为 value),而不仅仅是带有 text() 的那些 在其中(这是我真正感兴趣的).
  • 最后,我需要将 path 的 tail 元素 分开(在示例中总是 "name",但这会有所不同稍后),即 ('/users/groups/user', 'name', 'user3'),我可以单独处理.
  • In the final version the XML document will be coming from CLOBs of a table, not a static document.
  • The path column can of course also use dots or whatever and the initial slash is not the issue, any representation would do.
  • Also I would not mind if every inner node also gets a result row (possibly with null as value), not only the ones with text() in it (which is what I am really interested in).
  • In the end I will need the tail element of path separate (always "name" in the example here, but this will vary later), i.e. ('/users/groups/user', 'name', 'user3'), I can deal with that separately.

推荐答案

您可以在 XMLTable 函数来自 Oracle XML DB XQuery 函数集:

You can achieve that with help of XMLTable function from Oracle XML DB XQuery function set:

select * from 
  XMLTable(
    '
     declare function local:path-to-node( $nodes as node()* )  as xs:string* {
       $nodes/string-join(ancestor-or-self::*/name(.), ''/'')
     };
     for $i in $rdoc//name 
       return <ret><name_path>{local:path-to-node($i)}</name_path>{$i}</ret>
    '
    passing 
    XMLParse(content '
      <users><user><name>user1</name></user>
           <user><name>user2</name></user>
           <group>
              <user><name>user3</name></user>
           </group>
           <user><name>user4</name></user>
      </users>'
    )
    as "rdoc"
    columns 
      name_path  varchar2(4000) path '//ret/name_path',
      name_value varchar2(4000) path '//ret/name'

  )

对我来说,XQuery 看起来至少比 XSLT 对 XML 数据操作更直观.

For me XQuery looks at least more intuitive for XML data manipulation than XSLT.

您可以在此处找到有用的 XQuery 函数集.

You can find useful set of XQuery functions here.

更新 1

我想您在最后阶段需要具有完整数据的完全简单的数据集.这个目标可以通过复杂的方式达到,下面一步一步构建,但是这个变体非常耗费资源.我建议审查最终目标(选择一些特定的记录,计算元素数量等),然后简化此解决方案或完全更改它.

I suppose that you need totally plain dataset with full data at last stage. This target can be reached by complicated way, constructed step-by-step below, but this variant is very resource-angry. I propose to review final target (selecting some specific records, count number of elements etc.) and after that simplify this solution or totally change it.

更新 2

除了最后一步之外,所有步骤都从此更新中删除,因为@A.B.Cade 在评论中提出了更优雅的解决方案.此解决方案在下面的更新 3 部分中提供.

All steps deleted from this Update except last because @A.B.Cade proposed more elegant solution in comments. This solution provided in Update 3 section below.

Step 1 - 构建带有对应查询结果的 id 数据集

Step 1 - Constructing dataset of id's with corresponding query results

第 2 步 - 聚合到单个 XML 行

Step 2 - Aggregating to single XML row

第 3 步 - 最后通过使用 XMLTable 查询压缩的 XML 获得完整的普通数据集

Step 3 - Finally get full plain dataset by querying constracted XML with XMLTable

with xmlsource as (
  -- only for purpose to write long string only once
  select '
      <users><user><name>user1</name></user>
           <user><name>user2</name></user>
           <group>
              <user><name>user3</name></user>
           </group>
           <user><name>user4</name></user>
      </users>' xml_string
   from dual   
),
xml_table as ( 
  -- model of xmltable
  select 10 id, xml_string xml_data from xmlsource union all 
  select 20 id, xml_string xml_data from xmlsource union all 
  select 30 id, xml_string xml_data from xmlsource 
) 
select  *
from
  XMLTable(
    '
        for $entry_user in $full_doc/full_list/list_entry/name_info
          return <tuple>
                   <id>{data($entry_user/../@id_value)}</id>
                   <path>{$entry_user/name_path/text()}</path>
                   <name>{$entry_user/name_value/text()}</name>
                  </tuple> 
    '
    passing ( 
      select  
        XMLElement("full_list", 
          XMLAgg(     
            XMLElement("list_entry",
              XMLAttributes(id as "id_value"),
              XMLQuery(
                '
                 declare function local:path-to-node( $nodes as node()* )  as xs:string* {
                   $nodes/string-join(ancestor-or-self::*/name(.), ''/'')
                 };(: function to construct path :) 
                 for $i in $rdoc//name return <name_info><name_path>{local:path-to-node($i)}</name_path><name_value>{$i/text()}</name_value></name_info>
                '
                passing by value XMLParse(content xml_data) as "rdoc"
                returning content
              )
            )
          )
        )        
        from xml_table
    )   
    as "full_doc"      
    columns
      id_val   varchar2(4000) path '//tuple/id',
      path_val varchar2(4000) path '//tuple/path',
      name_val varchar2(4000) path '//tuple/name'
  )    

更新 3

正如@A.B.Cade 在他的评论中提到的,有非常简单的方法可以将 ID 与 XQuery 结果连接起来.

As mentioned by @A.B.Cade in his comment, there are really simple way to join ID's with XQuery results.

因为我不喜欢答案中的外部链接,下面的代码代表他的 SQL 小提琴,有点适应这个答案的数据源:

Because I don't like external links in answers, code below represents his SQL fiddle, a little bit adapted to the data source from this answer:

with xmlsource as (
  -- only for purpose to write long string only once
  select '
      <users><user><name>user1</name></user>
           <user><name>user2</name></user>
           <group>
              <user><name>user3</name></user>
           </group>
           <user><name>user4</name></user>
      </users>' xml_string
   from dual   
),
xml_table as ( 
  -- model of xmltable
  select 10 id, xml_string xml_data from xmlsource union all 
  select 20 id, xml_string xml_data from xmlsource union all
  select 30 id, xml_string xml_data from xmlsource
)
select xd.id, x.*  from
xml_table xd,
  XMLTable(
    'declare function local:path-to-node( $nodes as node()* )  as xs:string* {$nodes/string-join(ancestor-or-self::*/name(.), ''/'')     };     for $i in $rdoc//name        return <ret><name_path>{local:path-to-node($i)}</name_path>{$i}</ret>    '
    passing
    XMLParse(content xd.xml_data
    )
    as "rdoc"
    columns
      name_path  varchar2(4000) path '//ret/name_path',
      name_value varchar2(4000) path '//ret/name'

  ) x

相关文章