如何转义字符串以在 Boost Regex 中使用

2021-12-24 00:00:00 regex escaping c++ boost

我刚开始了解正则表达式,我正在使用 Boost Regex 库.

I'm just getting my head around regular expressions, and I'm using the Boost Regex library.

我需要使用一个包含特定 URL 的正则表达式,它会阻塞,因为显然 URL 中有一些字符是为正则表达式保留的,需要转义.

I have a need to use a regex that includes a specific URL, and it chokes because obviously there are characters in the URL that are reserved for regex and need to be escaped.

Boost 库中是否有任何函数或方法可以为这种用法转义字符串?我知道在大多数其他正则表达式实现中都有这样的方法,但我在 Boost 中没有看到.

Is there any function or method in the Boost library to escape a string for this kind of usage? I know there are such methods in most other regex implementations, but I don't see one in Boost.

或者,是否有需要转义的所有字符的列表?

Alternatively, is there a list of all characters that would need to be escaped?

推荐答案

. ^ $ | ( ) [ ] { } * + ? 

具有讽刺意味的是,您可以使用正则表达式来对 URL 进行转义,以便将其插入到正则表达式中.

Ironically, you could use a regex to escape your URL so that it can be inserted into a regex.

const boost::regex esc("[.^$|()\[\]{}*+?\\]");
const std::string rep("\\&");
std::string result = regex_replace(url_to_escape, esc, rep,
                                   boost::match_default | boost::format_sed);

(标志boost::format_sed 指定使用 sed 的替换字符串格式.在 sed 中,转义 & 将输出与整个表达式匹配的任何内容)

(The flag boost::format_sed specifies to use the replacement string format of sed. In sed, an escape & will output whatever matched by the whole expression)

或者如果你对sed的替换字符串格式不满意,只需将标志更改为boost::format_perl,你可以使用熟悉的$&来引用到与整个表达式匹配的任何内容.

Or if you are not comfortable with sed's replacement string format, just change the flag to boost::format_perl, and you can use the familiar $& to refer to whatever matched by the whole expression.

const std::string rep("\\$&");
std::string result = regex_replace(url_to_escape, esc, rep,
                                   boost::match_default | boost::format_perl);

相关文章