一系列 unicode 点 PHP 的正则表达式

2021-12-26 00:00:00 unicode regex php preg-replace

我试图从字符串中去除所有字符,除了:

I'm trying to strip all characters from a string except:

  • 字母数字字符
  • 美元符号 ($)
  • 下划线 (_)
  • 代码点U+0080U+FFFF之间的Unicode字符
  • Alphanumeric characters
  • Dollar sign ($)
  • Underscore (_)
  • Unicode characters between code points U+0080 and U+FFFF

我通过这样做获得了前三个条件:

I've got the first three conditions by doing this:

preg_replace('/[^a-zA-Zd$_]+/', '', $foo);

如何匹配第四个条件?我查看了使用 X 但有必须比列出 65000 多个字符更好.

How do I go about matching the fourth condition? I looked at using X but there has to be a better way than listing out 65000+ characters.

推荐答案

您可以使用:

$foo = preg_replace('/[^w$x{0080}-x{FFFF}]+/u', '', $foo);

  • w - 相当于 [a-zA-Z0-9_]
  • x{0080}-x{FFFF} 匹配代码点U+0080U+FFFF`<之间的字符/li>
  • /u 用于正则表达式中的 unicode 支持
    • w - is equivalent of [a-zA-Z0-9_]
    • x{0080}-x{FFFF} to match characters between code points U+0080andU+FFFF`
    • /u for unicode support in regex

相关文章