在 PHP 中流解析 4 GB XML 文件

2022-01-10 00:00:00 xml xml-parsing large-files large-data php

我正在尝试并需要一些帮助来执行以下操作:

I'm trying and need some help doing the following:

我想用 PHP 流式解析一个大的 XML 文件(4 GB).我不能使用简单的 XML 或 DOM,因为它们会将整个文件加载到内存中,所以我需要可以流式传输文件的东西.

I want to stream parse a large XML file ( 4 GB ) with PHP. I can't use simple XML or DOM because they load the entire file into memory, so I need something that can stream the file.

我如何在 PHP 中做到这一点?

How can I do this in PHP?

我要做的是浏览一系列 <doc> 元素.并将他们的一些孩子写入一个新的 xml 文件.

What I am trying to do is to navigate through a series of <doc> elements. And write some of their children to a new xml file.

我尝试解析的 XML 文件如下所示:

The XML file I am trying to parse looks like this:

<feed>
    <doc>
        <title>Title of first doc is here</title>
        <url>URL is here</url>
        <abstract>Abstract is here...</abstract>
        <links>
            <sublink>Link is here</sublink>
            <sublink>Link is here</sublink>
            <sublink>Link is here</sublink>
            <sublink>Link is here</sublink>
            <sublink>Link is here</sublink>
       </link>
    </doc>
    <doc>
        <title>Title of second doc is here</title>
        <url>URL is here</url>
        <abstract>Abstract is here...</abstract>
        <links>
            <sublink>Link is here</sublink>
            <sublink>Link is here</sublink>
            <sublink>Link is here</sublink>
            <sublink>Link is here</sublink>
            <sublink>Link is here</sublink>
       </link>
    </doc>
</feed>

我正在尝试获取/复制每个 <doc> 元素的所有子元素到一个新的 XML 文件中,除了 <links> 元素及其子元素.

I'm trying to get / copy all the children of each <doc> element into a new XML file except the <links> element and its children.

所以我希望新的 XML 文件看起来像:

So I want the new XML file to look like:

<doc>
    <title>Title of first doc is here</title>
    <url>URL is here</url>
    <abstract>Abstract is here...</abstract>
</doc>
<doc>
    <title>Title of second doc is here</title>
    <url>URL is here</url>
    <abstract>Abstract is here...</abstract>
</doc>

我非常感谢流/流解析/流读取原始 XML 文件,然后将其部分内容写入 PHP 中的新 XML 文件.

推荐答案

这是一个大学尝试.这假设正在使用一个文件,并且您想要写入一个文件:

Here's a college try. This assumes a file is being used, and that you want to write to a file:

<?php

$interestingNodes = array('title','url','abstract');
$xmlObject = new XMLReader();
$xmlObject->open('bigolfile.xml');

$xmlOutput = new XMLWriter();
$xmlOutput->openURI('destfile.xml');
$xmlOutput->setIndent(true);
$xmlOutput->setIndentString("   ");
$xmlOutput->startDocument('1.0', 'UTF-8');

while($xmlObject->read()){
    if($xmlObject->name == 'doc'){
        $xmlOutput->startElement('doc');
        $xmlObject->readInnerXML();
        if(array_search($xmlObject->name, $interestingNodes)){
             $xmlOutput->startElement($xmlObject->name);
             $xmlOutput->text($xmlObject->value);
             $xmlOutput->endElement(); //close the current node
        }
        $xmlOutput->endElement(); //close the doc node
    }
}

$xmlObject->close();
$xmlOutput->endDocument();
$xmlOutput->flush();

?>

相关文章