如何通过 StAX 修改一个巨大的 XML 文件?

2022-01-10 00:00:00 xml xml-parsing java stax

我有一个巨大的 XML (~2GB),我需要添加新元素并修改旧元素.例如,我有:

I have a huge XML (~2GB) and I need to add new Elements and modify the old ones. For example, I have:

<books>
    <book>....</book>
    ...
    <book>....</book>
</books>

又想得到:

<books>
   <book>
      <index></index>
      ....
   </book>
   ...
   <book>
      <index></index>
      ....
   </book>
</books>

我使用了以下代码:

XMLInputFactory inFactory = XMLInputFactory.newInstance();
XMLEventReader eventReader = inFactory.createXMLEventReader(new FileInputStream(file));
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLStreamWriter writer = factory.createXMLStreamWriter(new FileWriter(file, true));
while (eventReader.hasNext()) {
   XMLEvent event = eventReader.nextEvent();
   if (event.getEventType() == XMLEvent.START_ELEMENT) {
      if (event.asStartElement().getName().toString().equalsIgnoreCase("book")) {
          writer.writeStartElement("index");
          writer.writeEndElement();
       }
    }
}
writer.close();

但结果如下:

<books>
   <book>....</book>
   ....
   <book>....</book>
</books><index></index>

有什么想法吗?

推荐答案

试试这个

    XMLInputFactory inFactory = XMLInputFactory.newInstance();
    XMLEventReader eventReader = inFactory.createXMLEventReader(new FileInputStream("1.xml"));
    XMLOutputFactory factory = XMLOutputFactory.newInstance();
    XMLEventWriter writer = factory.createXMLEventWriter(new FileWriter(file));
    XMLEventFactory eventFactory = XMLEventFactory.newInstance();
    while (eventReader.hasNext()) {
        XMLEvent event = eventReader.nextEvent();
        writer.add(event);
        if (event.getEventType() == XMLEvent.START_ELEMENT) {
            if (event.asStartElement().getName().toString().equalsIgnoreCase("book")) {
                writer.add(eventFactory.createStartElement("", null, "index"));
                writer.add(eventFactory.createEndElement("", null, "index"));
            }
        }
    }
    writer.close();

注意事项

new FileWriter(file, true) 附加到文件末尾,你几乎不需要它

new FileWriter(file, true) is appending to the end of the file, you hardly really need it

equalsIgnoreCase("book") 是个坏主意,因为 XML 区分大小写

equalsIgnoreCase("book") is bad idea because XML is case-sensitive

相关文章