PHP 正则表达式 - 删除所有非字母数字字符
我使用 PHP.
我的字符串看起来像这样
这是一个字符串测试宽度åäö和一些über+奇怪的字符:_like this?
问题
有没有办法删除非字母数字字符并用空格替换它们?以下是一些非字母数字字符:
Is there a way to remove non-alphanumeric characters and replace them with a space? Here are some non-alphanumeric characters:
- -
- +
- :
- _
- ?
我已经阅读了很多关于它的主题,但它们不支持其他语言,比如这个:
I've read many threads about it but they don't support other languages, like this one:
preg_replace("/[^A-Za-z0-9 ]/", '', $string);
要求
- 我的非字母字符列表可能不完整.
- 我的内容包含不同语言的字符,例如 åäöü.可能还有很多.
- 非字母数字字符应替换为空格.否则这个词会粘在一起.
推荐答案
你可以试试这个:
preg_replace('~[^p{L}p{N}]++~u', ' ', $string);
p{L}
代表所有字母字符(不管是什么字母).
p{L}
stands for all alphabetic characters (whatever the alphabet).
p{N}
代表数字.
带有 u 修饰符的主题字符串的字符被视为 unicode 字符.
With the u modifier characters of the subject string are treated as unicode characters.
或者这个:
preg_replace('~P{Xan}++~u', ' ', $string);
p{Xan}
包含 unicode 字母和数字.
p{Xan}
contains unicode letters and digits.
P{Xan}
包含所有非 unicode 字母和数字.(小心,它也包含空格,你可以用 ~[^p{Xan}s]++~u
保留)
P{Xan}
contains all that is not unicode letters and digits. (Be careful, it contains white spaces too that you can preserve with ~[^p{Xan}s]++~u
)
如果您想要一组更具体的允许字母,您必须将 p{L}
替换为 unicode 表.
If you want a more specific set of allowed letters you must replace p{L}
with ranges in unicode table.
示例:
preg_replace('~[^a-zÀ-ÖØ-öÿŸd]++~ui', ' ', $string);
为什么在这里使用所有格量词 (++)?
~P{Xan}+~u
会给你和 ~P{Xan}++~u
一样的结果.这里的区别在于,在第一个引擎中,引擎记录了每个回溯位置(我们不需要),而在第二个中它没有(如在原子组中).结果是性能上的利润很小.
~P{Xan}+~u
will give you the same result as ~P{Xan}++~u
. The difference here is that in the first the engine records each backtracking position (that we don't need) when in the second it doesn't (as in an atomic group). The result is a small performance profit.
我认为在可能的情况下使用所有格量词和原子组是一种很好的做法.
I think it's a good practice to use possessive quantifiers and atomic groups when it's possible.
然而,PCRE 正则表达式引擎在明显的情况下会自动使一个量词所有格(例如:a+b
=> a++b
),除非 PCRE 模块有使用 PCRE_NO_AUTO_POSSESS 选项编译.(http://www.pcre.org/pcre.txt)
However, the PCRE regex engine makes automatically a quantifier possessive in obvious situations (example: a+b
=> a++b
) except If the PCRE module has been compiled with the option PCRE_NO_AUTO_POSSESS. (http://www.pcre.org/pcre.txt)
有关所有格量词和原子组的更多信息此处(所有格量词)和此处(原子组) 或此处一个>
More informations about possessive quantifiers and atomic groups here (possessive quantifiers) and here (atomic groups) or here
相关文章