Javascript RegExp 用于精确匹配具有特殊字符的多个单词

2022-01-19 00:00:00 regex reactjs frontend jquery javascript

我正在使用 RegExp 进行多个单词匹配.它具有动态值，因此当出现像("这样的特殊字符时，它会将其作为表达式并显示 Uncaught SyntaxError: Invalid regular expression error.

I'm using RegExp for multiple words match. It has dynamic values so when a special character like "(" comes it takes that as an expression and shows Uncaught SyntaxError: Invalid regular expression error.

let text = 'working text and (not working text' let findTerm = ['working text', '(not working text'] let replaceFromRegExp = new RegExp('\b'+`(${findTerm.join("|")})`+'\b', 'g') text = text.replace(replaceFromRegExp, match => "" + match + "") console.log(text)

推荐答案

A 单词边界匹配以下三个位置中的任何一个:

A word boundary matches any of the following three positions:

在字符串的第一个字符之前，如果第一个字符是单词字符.
在字符串的最后一个字符之后，如果最后一个字符是单词字符.
字符串中两个字符之间，一个是单词字符，另一个不是单词字符.您需要通用的单词边界，在搜索词之前需要一个非单词字符或字符串开头，在搜索字符串之后需要一个非单词字符或字符串结尾.

请注意，您还需要按长度降序对 findTerm 项进行排序，以避免术语重叠问题.

Note you need to also sort the findTerm items in the descending order by length to avoid overlapping term issues.

最后，不要忘记转义 findTerm 项以在正则表达式模式中使用.

Finally, do not forget to escape the findTerm items to be used in a regex pattern.

你可以使用

let text = 'working text and (not working text' let findTerm = ['working text', '(not working text'] findTerm.sort((a, b) => b.length - a.length); let replaceFromRegExp = new RegExp(String.raw`(?:B(?!w)|(?=w))(?:${findTerm.map(x => x.replace(/[-/\^$*+?.()|[]{}]/g, '\$&')).join("|")})(?:(?<=w)|(?<!w)B)`, 'g') // If the boundaries for special chars should not be checked remove B: // let replaceFromRegExp = new RegExp(String.raw`(?:(?!w)|(?=w))(?:${findTerm.map(x => x.replace(/[-/\^$*+?.()|[]{}]/g, '\$&')).join("|")})(?:(?<=w)|(?<!w))`, 'g') console.log(replaceFromRegExp) text = text.replace(replaceFromRegExp, "$&") console.log(text)

注意 $&>> 是 match =><标记>"+ match + ""，因为 $& 是对字符串替换模式中整个匹配值的反向引用.

Note that "$&" is a shorter way of saying match => "" + match + "", as $& is a backreference to the whole match value in a string replacement pattern.

正则表达式是

/(?:B(?!w)|(?=w))(?:(not working text|working text)(?:(?<=w)|(?<!w)B)/g

或者

/(?:(?!w)|(?=w))(?:(not working text|working text)(?:(?<=w)|(?<!w))/g

请参阅 regex #1 演示和 regex #2 演示.详情:

(?:B(?!w)|(?=w)) - 如果下一个字符不是单词字符，则为非单词边界，或者如果下一个字符是单词字符，则单词边界
(?:(?!w)|(?=w)) - 要么下一个字符必须是非单词字符，要么必须立即没有单词字符在当前位置的左边，下一个必须是单词 char(如果词以特殊字符开头，则不需要边界)
(?:(not working text|working text) - 与 findTerm 数组中设置的替代模式之一匹配的非捕获组
(?:(?<=w)|(?<!w)B) - 如果前面的 char 是单词 char，则为单词边界，或者如果前面的 char 不是单词 char，则为非单词边界
(?:(?<=w)|(?<!w)) - 如果前一个 char 是单词 char，则下一个不能是word char，或者前一个char不应该是word char(如果term以特殊char结尾，则不需要边界)

(?:B(?!w)|(?=w)) - either a non-word boundary if the next char is not a word char, or a word boundary if the next char is a word char

(?:(?!w)|(?=w)) - either the next char must be a non-word char, or there must be no word char immediately to the left of the current location, and the next one must be a word char (if the term starts with a special char, no boundary is required)

(?:(not working text|working text) - a non-capturing group matching one of the alternative patterns set in the findTerm array

(?:(?<=w)|(?<!w)B) - either a word boundary if the preceding char is a word char, or a non-word boundary if the preceding char is not a word char

(?:(?<=w)|(?<!w)) - if the previous char is a word char, the next one must not be a word char, or the previous char should not be a word char (if the term ends with a special char, no boundary is required)

相关文章