使用 PHP 去除 HTML 注释但保留条件
I'm currently using PHP and a regular expression to strip out all HTML comments from a page. The script works well... a little too well. It strips out all comments including my conditional comments in the . Here's what I've got:
<?php
function callback($buffer)
{
return preg_replace('/<!--(.|s)*?-->/', '', $buffer);
}
ob_start("callback");
?>
... HTML source goes here ...
<?php ob_end_flush(); ?>
Since my regex isn't too hot I'm having trouble trying to figure out how to modify the pattern to exclude Conditional comments such as:
<!--[if !IE]><!-->
<link rel="stylesheet" href="/css/screen.css" type="text/css" media="screen" />
<!-- <![endif]-->
<!--[if IE 7]>
<link rel="stylesheet" href="/css/ie7.css" type="text/css" media="screen" />
<![endif]-->
<!--[if IE 6]>
<link rel="stylesheet" href="/css/ie6.css" type="text/css" media="screen" />
<![endif]-->
Cheers
解决方案Since comments cannot be nested in HTML, a regex can do the job, in theory. Still, using some kind of parser would be the better choice, especially if your input is not guaranteed to be well-formed.
Here is my attempt at it. To match only normal comments, this would work. It has become quite a monster, sorry for that. I have tested it quite extensively, it seems to do it well, but I give no warranty.
<!--(?!s*(?:[if [^]]+]|<!|>))(?:(?!-->).)*-->
Explanation:
<!-- #01: "<!--"
(?! #02: look-ahead: a position not followed by:
s* #03: any number of space
(?: #04: non-capturing group, any of:
[if [^]]+] #05: "[if ...]"
|<! #06: or "<!"
|> #07: or ">"
) #08: end non-capturing group
) #09: end look-ahead
(?: #10: non-capturing group:
(?!-->) #11: a position not followed by "-->"
. #12: eat the following char, it's part of the comment
)* #13: end non-capturing group, repeat
--> #14: "-->"
Steps #02 and #11 are crucial. #02 makes sure that the following characters do not indicate a conditional comment. After that, #11 makes sure that the following characters do not indicate the end of the comment, while #12 and #13 cause the actual matching.
Apply with "global" and "dotall" flags.
To do the opposite (match only conditional comments), it would be something like this:
<!(--)?(?=[)(?:(?!<![endif]1>).)*<![endif]1>
Explanation:
<! #01: "<!"
(--)? #02: two dashes, optional
(?=[) #03: a position followed by "["
(?: #04: non-capturing group:
(?! #05: a position not followed by
<![endif]1> #06: "<![endif]>" or "<![endif]-->" (depends on #02)
) #07: end of look-ahead
. #08: eat the following char, it's part of the comment
)* #09: end of non-capturing group, repeat
<![endif]1> #10: "<![endif]>" or "<![endif]-->" (depends on #02)
Again, apply with "global" and "dotall" flags.
Step #02 is because of the "downlevel-revealed" syntax, see: "MSDN - About Conditional Comments".
I'm not entirely sure where spaces are allowed or expected. Add s*
to the expression where appropriate.
相关文章