如何在 CDATA 部分解析带有 HTML 标记的 XML 文件?

2022-01-10 00:00:00 xml xml-parsing html cdata java

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<extendedinfo type="html">
    <![CDATA[<table class="ResultTable" cellpadding=2 cellspacing=1 border=0><tr class="TableHeadingLine"><th bgcolor="#b3b3b3" align="left" colspan="6"><font face="arial, verdana, trebuchet, officina, sans-serif" size="+2"><B>Testcase: Init Testreport</B></font></th></tr><tr class="TableHeadingLine"><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="80px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="70px"></th></tr>]]>
</extendedinfo>
<extendedinfo type="html">
    <![CDATA[<tr><td class="DefineCell">58.675124</td><td class="DefaultCell" colspan="5"><i><font color="#008000">Set_Temperature is set to 23</font></i><br>Set_Temperature = 23</td></tr>]]>
</extendedinfo>

我有一个由上述格式的工具生成的 .XML 文件，其中包含 CDATA 部分中的 html 数据.哪个解析器或以什么方式可以使用 java 从 XMLfile 中检索 html 数据?

I have a .XML file generated by a tool in the above format, with html data within CDATA sections. Which parser or in what way can I retrieve the html data from the XMLfile using java?

推荐答案

只需访问 CDATA 作为文本内容

Just access the CDATA as text content

import java.io.BufferedInputStream; import java.io.FileInputStream; import java.io.InputStream; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; import org.w3c.dom.Node; import org.w3c.dom.NodeList; public void getCDATAFromHardcodedPathWithDom() { String yourSampleFile = "/path/toYour/sample/file.xml"; String cdataNode = "extendedinfo"; try (InputStream in = new BufferedInputStream(new FileInputStream(yourSampleFile))) { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(in); NodeList elements = doc.getElementsByTagName(cdataNode); for (int i = 0; i < elements.getLength(); i++) { Node e = elements.item(i); System.out.println(e.getTextContent()); } } catch (Exception e) { throw new RuntimeException(e); } }

变体 2(stax):

import java.io.BufferedInputStream; import java.io.FileInputStream; import java.io.InputStream; import javax.xml.stream.XMLInputFactory; import javax.xml.stream.XMLStreamConstants; import javax.xml.stream.XMLStreamReader; public void getCDATAFromHardcodedPathWithStax() { String yourSampleFile = "/path/toYour/sample/file.xml"; String cdataNode = "extendedinfo"; XMLStreamReader r = null; try (InputStream in = new BufferedInputStream(new FileInputStream(yourSampleFile));) { XMLInputFactory factory = XMLInputFactory.newInstance(); r = factory.createXMLStreamReader(in); while (r.hasNext()) { switch (r.getEventType()) { case XMLStreamConstants.START_ELEMENT: if (cdataNode.equals(r.getName().getLocalPart())) { System.out.println(r.getElementText()); } break; default: break; } r.next(); } } catch (Exception e) { throw new RuntimeException(e); } finally { if (r != null) { try { r.close(); } catch (Exception e) { throw new RuntimeException(e); } } } }

使用/path/toYour/sample/file.xml

With /path/toYour/sample/file.xml

<?xml version="1.0" encoding="utf-8" standalone="yes" ?> <root> <extendedinfo type="html"> <![CDATA[<table class="ResultTable" cellpadding=2 cellspacing=1 border=0><tr class="TableHeadingLine"><th bgcolor="#b3b3b3" align="left" colspan="6">Testcase: Init Testreport</th></tr><tr class="TableHeadingLine"><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="80px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="70px"></th></tr>]]> </extendedinfo> <extendedinfo type="html"> <![CDATA[<tr><td class="DefineCell">58.675124</td><td class="DefaultCell" colspan="5">Set_Temperature is set to 23 Set_Temperature = 23</td></tr>]]> </extendedinfo> </root>

它会给你

<table class="ResultTable" cellpadding=2 cellspacing=1 border=0><tr class="TableHeadingLine"><th bgcolor="#b3b3b3" align="left" colspan="6">Testcase: Init Testreport</th></tr><tr class="TableHeadingLine"><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="80px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="70px"></th></tr> <tr><td class="DefineCell">58.675124</td><td class="DefaultCell" colspan="5">Set_Temperature is set to 23 Set_Temperature = 23</td></tr>

这里给出了一个有趣的使用 JAXB 的替代方法:

An interesting alternative using JAXB is given here:

从 CDATA 中检索值

这里给出了如何提取所有 CDATA 的示例:

An example on how to extract just all CDATA is given here:

无法使用 XMLEventReader 检查 XML 中的 CDATA税表

相关文章