嵌套标签的正则表达式(最里面使它更容易)

2022-01-18 00:00:00 regex nested tags html

我对此进行了相当多的研究,但找不到如何将嵌套的 html 标签 与 属性匹配的工作示例.我知道可以匹配没有属性的平衡/嵌套最里面的标签(例如,正则表达式和将是 #<div[^>]*>(?:(?> [^<]+ ) |<(?!div[^>]*>))*?</div>#x).

但是,我希望看到一个正则表达式模式,它可以找到带有属性的 html 标记对.

例子:基本上应该匹配

<div class="aaa">**<div class="aaa">** <div>

</div>**</div>** </div>

而不是

<div class="aaa">**<div class="aaa">** <div>

**</div>** </div></div>

有人有什么想法吗?

出于测试目的,我们可以使用:http://www.lumadis.be/regex/test_regex.php

<小时>

PS.Steven 在他的博客中提到了一个解决方案(实际上是在评论中),但它不起作用

http://blog.stevenlevithan.com/archives/match-innermost-html元素

$regex = '/

解决方案

匹配 <div> &</div> 标签,加上它们的属性 &内容:

#<div(?:(?!(<div|</div>)).)*</div>#s

这里的关键是 (?:(?!STRING).)* 是字符串,就像 [^CHAR]* 是字符一样.

来源:https://stackoverflow.com/a/6996274

<小时>

PHP 中的示例:

 $match) {回声************".
".$匹配."
";}

输出:

************<div id="3">在 3</div>************<div id="5">在 5</div>

I researched this quite a bit, but couldn't find a working example how to match nested html tags with attributes. I know it is possible to match balanced/nested innermost tags without attributes (for example a regex for and would be #<div[^>]*>(?:(?> [^<]+ ) |<(?!div[^>]*>))*?</div>#x).

However, I would like to see a regex pattern that finds an html tag pair with attributes.

Example: It basically should match

<div class="aaa"> **<div class="aaa">** <div> <div> </div> **</div>** </div>

and not

<div class="aaa"> **<div class="aaa">** <div> <div> **</div>** </div> </div>

Anybody has some ideas?

For testing purposes we could use: http://www.lumadis.be/regex/test_regex.php


PS. Steven mentioned a solution in his blog (actually in a comment), but it doesn't work

http://blog.stevenlevithan.com/archives/match-innermost-html-element

$regex = '/<div[^>]+?ids*=s*"MyID"[^>]*>(?:((?:[^<]++|<(?!/?div[^>]*>))+)|(<div[^>]*>(?>(?1)|(?2))*</div>))?</div>/i';

解决方案

Matching innermost matching pairs of <div> & </div> tags, plus their attributes & content:

#<div(?:(?!(<div|</div>)).)*</div>#s

The key here is that (?:(?!STRING).)* is to strings as [^CHAR]* is to characters.

Credit: https://stackoverflow.com/a/6996274


Example in PHP:

<?php

$text = <<<'EOD'
<div id="1">
  in 1
  <div id="2">
    in 2
    <div id="3">
      in 3
    </div>
  </div>
</div>
<div id="4">
  in 4
  <div id="5">
    in 5
  </div>
</div>
EOD;

$matches = array();
preg_match_all('#<div(?:(?!(<div|</div>)).)*</div>#s', $text, $matches);

foreach ($matches[0] as $index => $match) {
  echo "************" . "
" . $match . "
";
}

Outputs:

************
<div id="3">
      in 3
    </div>
************
<div id="5">
    in 5
  </div>

相关文章