如何在 CDATA 部分解析带有 HTML 标记的 XML 文件?

2022-01-10 00:00:00 xml xml-parsing html cdata java
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<extendedinfo type="html">
    <![CDATA[<table class="ResultTable" cellpadding=2 cellspacing=1 border=0><tr class="TableHeadingLine"><th bgcolor="#b3b3b3" align="left" colspan="6"><font face="arial, verdana, trebuchet, officina, sans-serif" size="+2"><B>Testcase: Init Testreport</B></font></th></tr><tr class="TableHeadingLine"><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="80px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="70px"></th></tr>]]>
</extendedinfo>
<extendedinfo type="html">
    <![CDATA[<tr><td class="DefineCell">58.675124</td><td class="DefaultCell" colspan="5"><i><font color="#008000">Set_Temperature is set to 23</font></i><br>Set_Temperature = 23</td></tr>]]>
</extendedinfo>

我有一个由上述格式的工具生成的 .XML 文件,其中包含 CDATA 部分中的 html 数据.哪个解析器或以什么方式可以使用 java 从 XMLfile 中检索 html 数据?

I have a .XML file generated by a tool in the above format, with html data within CDATA sections. Which parser or in what way can I retrieve the html data from the XMLfile using java?

推荐答案

只需访问 CDATA 作为文本内容

Just access the CDATA as text content

    import java.io.BufferedInputStream;
    import java.io.FileInputStream;
    import java.io.InputStream;
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import org.w3c.dom.Document;
    import org.w3c.dom.Node;
    import org.w3c.dom.NodeList;

public void getCDATAFromHardcodedPathWithDom() {
    String yourSampleFile = "/path/toYour/sample/file.xml";
    String cdataNode = "extendedinfo";
    try (InputStream in =
            new BufferedInputStream(new FileInputStream(yourSampleFile))) {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document doc = builder.parse(in);
        NodeList elements = doc.getElementsByTagName(cdataNode);
        for (int i = 0; i < elements.getLength(); i++) {
            Node e = elements.item(i);
            System.out.println(e.getTextContent());
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

变体 2(stax):

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

public void getCDATAFromHardcodedPathWithStax() {
    String yourSampleFile = "/path/toYour/sample/file.xml";
    String cdataNode = "extendedinfo";
    XMLStreamReader r = null;
    try (InputStream in =
            new BufferedInputStream(new FileInputStream(yourSampleFile));)        {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        r = factory.createXMLStreamReader(in);
        while (r.hasNext()) {
            switch (r.getEventType()) {
            case XMLStreamConstants.START_ELEMENT:
                if (cdataNode.equals(r.getName().getLocalPart())) {
                    System.out.println(r.getElementText());
                }
                break;
            default:
                break;
            }
            r.next();
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    } finally {
        if (r != null) {
            try {
                r.close();
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    }
}

使用/path/toYour/sample/file.xml

With /path/toYour/sample/file.xml

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<root>
<extendedinfo type="html">
    <![CDATA[<table class="ResultTable" cellpadding=2 cellspacing=1 border=0><tr class="TableHeadingLine"><th bgcolor="#b3b3b3" align="left" colspan="6"><font face="arial, verdana, trebuchet, officina, sans-serif" size="+2"><B>Testcase: Init Testreport</B></font></th></tr><tr class="TableHeadingLine"><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="80px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="70px"></th></tr>]]>
</extendedinfo>
<extendedinfo type="html">
    <![CDATA[<tr><td class="DefineCell">58.675124</td><td class="DefaultCell" colspan="5"><i><font color="#008000">Set_Temperature is set to 23</font></i><br>Set_Temperature = 23</td></tr>]]>
</extendedinfo>
</root>

它会给你

<table class="ResultTable" cellpadding=2 cellspacing=1 border=0><tr class="TableHeadingLine"><th bgcolor="#b3b3b3" align="left" colspan="6"><font face="arial, verdana, trebuchet, officina, sans-serif" size="+2"><B>Testcase: Init Testreport</B></font></th></tr><tr class="TableHeadingLine"><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="80px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="70px"></th></tr>


<tr><td class="DefineCell">58.675124</td><td class="DefaultCell" colspan="5"><i><font color="#008000">Set_Temperature is set to 23</font></i><br>Set_Temperature = 23</td></tr>

这里给出了一个有趣的使用 JAXB 的替代方法:

An interesting alternative using JAXB is given here:

从 CDATA 中检索值

这里给出了如何提取所有 CDATA 的示例:

An example on how to extract just all CDATA is given here:

无法使用 XMLEventReader 检查 XML 中的 CDATA税表

相关文章