如何使用 php 删除带有空文本节点的标签?

2022-01-18 00:00:00 tags php html textnode

如何使用 php 去除带有空文本节点的标签?

How can I use php to remove tags with empty text node?

例如,

<div class="box"></div>删除

<a href="#"></a> 删除

<p><a href="#"></a></p>删除

<span style="..."></span> 移除

但我想像这样保留带有文本节点的标签,

But I want to keep the tag with text node like this,

<a href="#">link</a>保持

我也想把这种乱七八糟的东西去掉,

I want to remove something messy like this too,

<p><strong><a href="http://xx.org.uk/dartmoor-arts"></a></strong></p>
<p><strong><a href="http://xx.org.uk/depw"></a></strong></p>
<p><strong><a href="http://xx.org.uk/devon-guild-of-craftsmen"></a></strong></p>

我测试了下面的两个正则表达式,

I tested both regex below,

$content = preg_replace('!<(.*?)[^>]*>s*</1>!','',$content);
$content = preg_replace('%<(.*?)[^>]*>\s*</\1>%', '', $content);

但他们会留下这样的东西,

But they leave something like this,

<p><strong></strong></p>
<p><strong></strong></p>
<p><strong></strong></p>

推荐答案

一种方法可能是:

$dom = new DOMDocument();
$dom->loadHtml(
    '<p><strong><a href="http://xx.org.uk/dartmoor-arts">test</a></strong></p>
    <p><strong><a href="http://xx.org.uk/depw"></a></strong></p>
    <p><strong><a href="http://xx.org.uk/devon-guild-of-craftsmen"></a></strong></p>'
);

$xpath = new DOMXPath($dom);

while(($nodeList = $xpath->query('//*[not(text()) and not(node())]')) && $nodeList->length > 0) {
    foreach ($nodeList as $node) {
        $node->parentNode->removeChild($node);
    }
}

echo $dom->saveHtml();

您可能需要根据自己的需要进行一些更改.

Probably you'll have to change that a bit for your needs.

相关文章