一系列 unicode 点 PHP 的正则表达式

2021-12-26 00:00:00 unicode regex php preg-replace

我试图从字符串中去除所有字符，除了:

I'm trying to strip all characters from a string except:

字母数字字符
美元符号 ($)
下划线 (_)
代码点U+0080和U+FFFF之间的Unicode字符

Alphanumeric characters

Dollar sign ($)

Underscore (_)

Unicode characters between code points U+0080 and U+FFFF

我通过这样做获得了前三个条件:

I've got the first three conditions by doing this:

preg_replace('/[^a-zA-Zd$_]+/', '', $foo);

如何匹配第四个条件?我查看了使用 X 但有必须比列出 65000 多个字符更好.

How do I go about matching the fourth condition? I looked at using X but there has to be a better way than listing out 65000+ characters.

推荐答案

您可以使用:

$foo = preg_replace('/[^w$x{0080}-x{FFFF}]+/u', '', $foo);

w - 相当于 [a-zA-Z0-9_]
x{0080}-x{FFFF} 匹配代码点U+0080和U+FFFF`<之间的字符/li>
/u 用于正则表达式中的 unicode 支持

w - is equivalent of [a-zA-Z0-9_]

x{0080}-x{FFFF} to match characters between code points U+0080andU+FFFF`

/u for unicode support in regex

相关文章