PHP 中的 preg_match 和 UTF-8

2021-12-26 00:00:00 unicode utf-8 pcre php

我正在尝试使用 preg_match<搜索 UTF8 编码的字符串/a>.

I'm trying to search a UTF8-encoded string using preg_match.

preg_match('/H/u', "xC2xA1Hola!", $a_matches, PREG_OFFSET_CAPTURE); echo $a_matches[0][1];

这应该打印 1，因为H"在字符串¡Hola!"的索引 1 处.但它打印 2.所以它似乎没有将主题视为 UTF8 编码的字符串，即使我正在传递 "u" 修饰符.

This should print 1, since "H" is at index 1 in the string "¡Hola!". But it prints 2. So it seems like it's not treating the subject as a UTF8-encoded string, even though I'm passing the "u" modifier in the regular expression.

我的 php.ini 中有以下设置，并且其他 UTF8 函数正在运行:

I have the following settings in my php.ini, and other UTF8 functions are working:

mbstring.func_overload = 7 mbstring.language = Neutral mbstring.internal_encoding = UTF-8 mbstring.http_input = pass mbstring.http_output = pass mbstring.encoding_translation = Off

有什么想法吗?

推荐答案

看起来这是一个功能"，见http://bugs.php.net/bug.php?id=37391

Looks like this is a "feature", see http://bugs.php.net/bug.php?id=37391

'u' 开关只对 pcre 有意义，PHP 本身并不知道.

'u' switch only makes sense for pcre, PHP itself is unaware of it.

从 PHP 的角度来看，字符串是字节序列，返回字节偏移似乎是合乎逻辑的(我不是说正确").

From PHP's point of view, strings are byte sequences and returning byte offset seems logical (i don't say "correct").

相关文章