通过 php dom 在 html 片段中通过超链接查找和替换关键字

2021-12-25 00:00:00 replace php html

我正在尝试使用 simple_html_dom php 类来创建查找和替换关键字的查找和替换函数并将它们替换为关键字定义的链接,并将关键字作为链接文本.

我如何使用此类在字符串中找到并用 Dexia</a> 替换Dexia",例如<div><p>德克夏银行的CEO刚刚决定退休.</p></div>?

解决方案

这有点棘手,但你可以这样做:

$html = <<<HTML<div><p>德克夏银行的首席执行官<em>已经</em>刚刚决定退休.</p></div>HTML;

我添加了一个强调元素只是为了说明它也适用于内联元素.

设置

$dom = 新的 DOMDocument;$dom->formatOutput = TRUE;$dom->loadXML($html);$xpath = new DOMXPath($dom);$nodes = $xpath->query('//text()[contains(., "Dexia")]');

上面有趣的当然是XPath.它为所有包含针Dexia"的 DOMText 节点查询加载的 DOM.结果是 DOMNodeList(像往常一样).

替换

foreach($nodes as $node) {$link = '<a href="info.php?tag=dexia">Dexia</a>';$replaced = str_replace('Dexia', $link, $node->wholeText);$newNode = $dom->createDocumentFragment();$newNode->appendXML($replaced);$node->parentNode->replaceChild($newNode, $node);}echo $dom->saveXML($dom->documentElement);

找到的 $node 将包含 wholeText 的字符串 The CEO of the Dexia bank ,尽管它在 P 中 元素.那是因为 $node 有一个兄弟 DOMElement,重点放在 bank 之后.我将链接创建为字符串而不是节点,并用它替换 wholeText 中所有出现的Dexia"(不管词边界如何 - 这将是对 Regex 的一个很好的调用).然后我从结果字符串创建一个 DocumentFragment 并用它替换 DOMText 节点.

W3C 与 PHP

使用DocumentFragement::applyXML() 是一种非标准方法,因为该方法不是 W3C DOM 规范的一部分.

如果您想用标准 API 进行替换,您首先必须将 A 元素创建为新的 DOMElement.然后你必须在 DOMTextnodeValue 中找到Dexia"的偏移量,然后将 DOMText 节点拆分为两个节点位置.从返回的兄弟中移除 Dexia 并在第二个之前插入 Link 元素.对同级节点重复此过程,直到在节点中找不到更多的 Dexia 字符串.以下是针对一次德克夏的处理方法:

foreach($nodes as $node) {$link = $dom->createElement('a', 'Dexia');$link->setAttribute('href', 'info.php?tag=dexia');$offset = strpos($node->nodeValue, '德克夏');$newNode = $node->splitText($offset);$newNode->deleteData(0, strlen('Dexia'));$node->parentNode->insertBefore($link, $newNode);}

最后是输出

<p><a href="info.php?tag=dexia">Dexia</a>的CEO银行<em>有</em>刚刚决定退休.</p>

I'm trying to use the simple_html_dom php class to create a find and replace function that looks for keywords and replace them by a link to a definition of the keyword, with the keyword as link text.

How can i find and replace "Dexia" with <a href="info.php?tag=dexia">Dexia</a> using this class, inside a string such as <div><p>The CEO of the Dexia bank has just decided to retire.</p></div> ?

解决方案

That's somewhat tricky, but you could do it this way:

$html = <<< HTML
<div><p>The CEO of the Dexia bank <em>has</em> just decided to retire.</p></div>
HTML;

I've added an emphasis element just to illustrate that it works with inline elements too.

Setup

$dom = new DOMDocument;
$dom->formatOutput = TRUE;
$dom->loadXML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//text()[contains(., "Dexia")]');

The interesting thing above is the XPath of course. It queries the loaded DOM for all DOMText nodes containing the needle "Dexia". The result is DOMNodeList (as usual).

The replacement

foreach($nodes as $node) {
    $link     = '<a href="info.php?tag=dexia">Dexia</a>';
    $replaced = str_replace('Dexia', $link, $node->wholeText);
    $newNode  = $dom->createDocumentFragment();
    $newNode->appendXML($replaced);
    $node->parentNode->replaceChild($newNode, $node);
}
echo $dom->saveXML($dom->documentElement);

The found $node will contain the string The CEO of the Dexia bank for wholeText, despite it being inside the P element. That is because the $node has a sibling DOMElement with the emphasis after bank. I am creating the link as a string instead of a node and replace all occurences of "Dexia" (regardless of word boundary - that would be a good call for Regex) in the wholeText with it. Then I create a DocumentFragment from the resulting string and replace the DOMText node with it.

W3C vs PHP

Using DocumentFragement::applyXML() is a non-standard approach, because the method is not part of the W3C DOM Specs.

If you would want to do the replacement with the standard API, you'd first have to create the A Element as a new DOMElement. Then you'd have to find the offset of "Dexia" in the nodeValue of the DOMText and split the DOMText Node into two nodes at that position. Remove Dexia from the returned sibling and insert the Link Element, before the second one. Repeat this procedure with the sibling node until no more Dexia strings are found in the node. Here is how to do it for one occurence of Dexia:

foreach($nodes as $node) {
    $link = $dom->createElement('a', 'Dexia');
    $link->setAttribute('href', 'info.php?tag=dexia');
    $offset  = strpos($node->nodeValue, 'Dexia');
    $newNode = $node->splitText($offset);
    $newNode->deleteData(0, strlen('Dexia'));
    $node->parentNode->insertBefore($link, $newNode);
}

And finally the output

<div>
  <p>The CEO of the <a href="info.php?tag=dexia">Dexia</a> bank <em>has</em> just decided to retire.</p>
</div>

相关文章