从流输入中解析没有根元素的 XML 片段列表

2022-01-10 00:00:00 xml xml-parsing java sax

在 Java 中使用 SAX api 从流输入中解析没有根元素的 XML 片段列表是否可行?

Is it feasible in Java using the SAX api to parse a list of XML fragments with no root element from a stream input?

我尝试解析这样的 XML,但得到了一个

I tried parsing such an XML but got a

org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed.

甚至在 endDocument 事件被触发之前.

before even the endDocument event was fired.

我不想解决明显但笨拙的解决方案,例如预先附加自定义根元素或使用缓冲片段解析".

I would like not to settle with obvious but clumsy solutions as "Pre-append a custom root element or Use buffered fragment parsing".

我正在使用 Java 1.6 的标准 SAX API.SAX 工厂有 setValidating(false) 以防万一.

I am using the standard SAX API of Java 1.6. The SAX factory had setValidating(false) in case anyone wondered.

推荐答案

首先,最重要的是,您正在解析的内容不是 XML 文档.来自 XML 规范:

First, and most important of all, the content you are parsing is not an XML document. From the XML Specification:

[定义:只有一个元素,称为根,或文档元素,其任何部分都不会出现在任何其他元素的内容中.]

[Definition: There is exactly one element, called the root, or document element, no part of which appears in the content of any other element.]

现在,至于用 SAX 解析这个 - 尽管你说过笨拙 - 我建议采用以下方法:

Now, as to parsing this with SAX - in spite of what you said about clumsiness - I'd suggest the following approach:

Enumeration<InputStream> streams = Collections.enumeration(
    Arrays.asList(new InputStream[] {
        new ByteArrayInputStream("<root>".getBytes()),
        yourXmlLikeStream,
        new ByteArrayInputStream("</root>".getBytes()),
    }));

SequenceInputStream seqStream = new SequenceInputStream(streams);

// Now pass the `seqStream` into the SAX parser.

使用 SequenceInputStream 是将多个输入流连接成单个流的便捷方式.它们将按照传递给构造函数的顺序被读取(或者在这种情况下 - 由 Enumeration 返回).

将它传递给您的 SAX 解析器,您就完成了.

Pass it to your SAX parser, and you are done.

相关文章