JAXB 混合内容列表包含换行符

2022-01-19 00:00:00 xml java jaxb

我希望您能帮助我解决我在 JAXB 方面遇到的问题.

I was hoping that you might be able to help me with a problem that I'm facing regarding JAXB.

我有以下 XML 文件:

I have the following XML file:

<root>
    <prop>
        <field1>
            <value1>v1</value1>
            <value2>v2</value2>
        </field1>
        <field2>
            <value1>v1</value1>
            <value2>v2</value2>
        </field2>
    </prop>
    <prop>
        text
        <field1>
            <value1>v1</value1>
            <value2>v2</value2>
        </field1>
    </prop>
    <prop>
        text
    </prop>
</root>

XML 下可以有其他元素(field1、field2)、文本或两者兼有.

The XML can have under prop other elements (field1, field2), text or both.

还有以下类:

@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name = "root")
public class Root {

    protected List<Root.Element> prop;

    @XmlAccessorType(XmlAccessType.FIELD)
    public static class Element {
        @XmlMixed
        protected List<String> content;
        @XmlElement
        public Field1 field1;
        @XmlElement
        public Field2 field2;

        @XmlAccessorType(XmlAccessType.FIELD)
        public static class Field1 {
            @XmlElement
            protected String value1;
            @XmlElement
            protected String value2;
        }

        @XmlAccessorType(XmlAccessType.FIELD)
        public static class Field2 {
            @XmlElement
            protected String value1;
            @XmlElement
            protected String value2;

        }

    }

}

我想将 XML 解组到上述类中.我遇到的问题是,在内容列表中,除了文本之外,还有其他字符,例如换行符和制表符.更具体地说,基于上面的 XML,当我尝试解组时,我得到:

I want to unmarshal the XML in to the above classes. The issue that I'm having is that in the content list I get, besides the text, other characters like newline and tab. To be more specific, based on the above XML, when I try to unmarshal I get:

  • 第一个属性类似于 [" ", " ", " "] - 它应该是一个空列表
  • 第二个道具,内容如 [" text "," "] - 它应该是一个包含一个字符串的列表
  • 第三个带有内容的道具像 [" text "] - 它应该是一个空列表
  • first prop with content like [" ", " ", " "] - it should be an empty list
  • second prop with content like [" text ", " "] - it should be a list with one string
  • third prop with content like [" text "] - it should be an empty list

我已经尝试创建一个 XMLAdapter,但它适用于列表中的每个元素,所以如果我删除 和 并返回 null 如果它是一个空字符串,我仍然会得到一个包含一些字符串的列表和一些空值.

I have already tried to create and a XMLAdapter but it is applied for every element in the list, so if I remove the and and return null if it is an empty string I still get a list with some strings and some null values.

推荐答案

为什么会这样

具有混合上下文的元素中的空白内容被视为重要.

Why It's Happening

White space content in an element that has mixed context is treated as significant.

您可以将 JAXB 与 StAX 一起使用来支持此用例.使用 StAX,您可以创建一个过滤的 XMLStreamReader,这样任何仅包含空格的字符串都不会被报告为事件.下面是一个如何实现它的示例.

You could use JAXB with StAX to support this use case. With StAX you can create a filtered XMLStreamReader so that any character strings that only contain white space are not reported as events. Below is an example of how you could implement it.

import javax.xml.bind.*;
import javax.xml.stream.*;
import javax.xml.transform.stream.StreamSource;

public class Demo {

    public static void main(String[] args) throws Exception {
        JAXBContext jc = JAXBContext.newInstance(Root.class);

        XMLInputFactory xif = XMLInputFactory.newFactory();
        XMLStreamReader xsr = xif.createXMLStreamReader(new StreamSource("src/forum22284324/input.xml"));
        xsr = xif.createFilteredReader(xsr, new StreamFilter() {

            @Override
            public boolean accept(XMLStreamReader reader) {
                if(reader.getEventType() == XMLStreamReader.CHARACTERS) {
                    return reader.getText().trim().length() > 0;
                } 
                return true;
            }

        });

        Unmarshaller unmarshaller = jc.createUnmarshaller();
        Root root = (Root) unmarshaller.unmarshal(xsr);
    }

}

相关文章