PHP 正则表达式中的 UTF-8

2021-12-28 00:00:00 regex utf-8 php

我需要正则表达式方面的帮助.我的字符串包含 unicode 字符,下面的代码不起作用.

I need help with regular expressions. My string contains unicode characters and code below doesn't work.

前四个字符必须是数字,然后是逗号,然后是任何字母字符或空格...我已经读过,如果我在常规表达式的末尾添加/u 但它对我不起作用...

First four characters must be numbers, then comma and then any alphabetic characters or whitespaces... I already read that if i add /u on end of regular expresion but it didn't work for me...

我的代码适用于非 unicode 字符

My code works with non-unicode characters

$post = '9999,škofja loka';;
echo preg_match('/^[0-9]{4},[s]*[a-zA-Z]+', $post);




Updated answer:
This is now tested and working

$post = '9999, škofja loka';
echo preg_match('/^\d{4},[\s\p{L}]+$/u', $post);

\w 将不起作用,因为它不包含所有 unicode 字母,并且除了字母之外还包含 [0-9_].

\w will not work, because it does not contain all unicode letters and contains also [0-9_] additionally to the letters.

重要的还有 u 修饰符来激活 unicode 模式.

Important is also the u modifier to activate the unicode mode.

如果逗号后可以有字母 或 空格,那么您应该将它们放入相同的字符类中,在您的正则表达式中,逗号后有 0 个或多个空格,然后只有字母.

If there can be letters or whitespace after the comma then you should put those into the same character class, in your regex there are 0 or more whitespace after the comma and then there are only letters.

参见用于 php 正则表达式的详细信息

See for php regex details

\p{L}(Unicode 字母)解释 这里

The \p{L} (Unicode letter) is explained here

重要的还有使用字符串边界的结尾 $ 来确保真正完整的字符串被验证,否则它只会匹配第一个空格而忽略其余的例如.

Important is also the use of the end of string boundary $ to ensure that really the complete string is verified, otherwise it will match only the first whitespace and ignore the rest for example.
