PHP 中的 preg_match 和 UTF-8
我正在尝试使用 preg_match<搜索 UTF8 编码的字符串/a>.
I'm trying to search a UTF8-encoded string using preg_match.
preg_match('/H/u', "xC2xA1Hola!", $a_matches, PREG_OFFSET_CAPTURE);
echo $a_matches[0][1];
这应该打印 1,因为H"在字符串¡Hola!"的索引 1 处.但它打印 2.所以它似乎没有将主题视为 UTF8 编码的字符串,即使我正在传递 "u" 修饰符.
This should print 1, since "H" is at index 1 in the string "¡Hola!". But it prints 2. So it seems like it's not treating the subject as a UTF8-encoded string, even though I'm passing the "u" modifier in the regular expression.
我的 php.ini 中有以下设置,并且其他 UTF8 函数正在运行:
I have the following settings in my php.ini, and other UTF8 functions are working:
mbstring.func_overload = 7
mbstring.language = Neutral
mbstring.internal_encoding = UTF-8
mbstring.http_input = pass
mbstring.http_output = pass
mbstring.encoding_translation = Off
有什么想法吗?
推荐答案
看起来这是一个功能",见http://bugs.php.net/bug.php?id=37391
Looks like this is a "feature", see http://bugs.php.net/bug.php?id=37391
'u' 开关只对 pcre 有意义,PHP 本身并不知道.
'u' switch only makes sense for pcre, PHP itself is unaware of it.
从 PHP 的角度来看,字符串是字节序列,返回字节偏移似乎是合乎逻辑的(我不是说正确").
From PHP's point of view, strings are byte sequences and returning byte offset seems logical (i don't say "correct").
相关文章