PHP 正则表达式 - 删除所有非字母数字字符

2021-12-25 00:00:00 regex utf-8 replace php

我使用 PHP.


这是一个字符串测试宽度åäö和一些über+奇怪的字符:_like this?



Is there a way to remove non-alphanumeric characters and replace them with a space? Here are some non-alphanumeric characters:

  • -
  • +
  • :
  • _
  • ?


I've read many threads about it but they don't support other languages, like this one:

preg_replace("/[^A-Za-z0-9 ]/", '', $string);


  • 我的非字母字符列表可能不完整.
  • 我的内容包含不同语言的字符,例如 åäöü.可能还有很多.
  • 非字母数字字符应替换为空格.否则这个词会粘在一起.



preg_replace('~[^p{L}p{N}]++~u', ' ', $string);

p{L} 代表所有字母字符(不管是什么字母).

p{L} stands for all alphabetic characters (whatever the alphabet).

p{N} 代表数字.

带有 u 修饰符的主题字符串的字符被视为 unicode 字符.

With the u modifier characters of the subject string are treated as unicode characters.


preg_replace('~P{Xan}++~u', ' ', $string);

p{Xan} 包含 unicode 字母和数字.

p{Xan} contains unicode letters and digits.

P{Xan} 包含所有非 unicode 字母和数字.(小心,它也包含空格,你可以用 ~[^p{Xan}s]++~u 保留)

P{Xan} contains all that is not unicode letters and digits. (Be careful, it contains white spaces too that you can preserve with ~[^p{Xan}s]++~u )

如果您想要一组更具体的允许字母,您必须将 p{L} 替换为 unicode 表.

If you want a more specific set of allowed letters you must replace p{L} with ranges in unicode table.


preg_replace('~[^a-zÀ-ÖØ-öÿŸd]++~ui', ' ', $string);

为什么在这里使用所有格量词 (++)?

~P{Xan}+~u 会给你和 ~P{Xan}++~u 一样的结果.这里的区别在于,在第一个引擎中,引擎记录了每个回溯位置(我们不需要),而在第二个中它没有(如在原子组中).结果是性能上的利润很小.

~P{Xan}+~u will give you the same result as ~P{Xan}++~u. The difference here is that in the first the engine records each backtracking position (that we don't need) when in the second it doesn't (as in an atomic group). The result is a small performance profit.


I think it's a good practice to use possessive quantifiers and atomic groups when it's possible.

然而,PCRE 正则表达式引擎在明显的情况下会自动使一个量词所有格(例如:a+b => a++b),除非 PCRE 模块有使用 PCRE_NO_AUTO_POSSESS 选项编译.(

However, the PCRE regex engine makes automatically a quantifier possessive in obvious situations (example: a+b => a++b) except If the PCRE module has been compiled with the option PCRE_NO_AUTO_POSSESS. (

有关所有格量词和原子组的更多信息此处(所有格量词)和此处(原子组) 或此处一个>

More informations about possessive quantifiers and atomic groups here (possessive quantifiers) and here (atomic groups) or here
