PHP中如何判断字母是大写还是小写?
我也有带有变音符号的 UTF-8 文本,想检查该文本的第一个字母是大写还是小写.如何做到这一点?
I have texts in UTF-8 with diacritic characters also, and would like to check if first letter of this text is upper case or lower case. How to do this?
推荐答案
我认为,与此处发布的其他解决方案相比,进行 preg_
调用是最直接、简洁和可靠的调用.
It is my opinion that making a preg_
call is the most direct, concise, and reliable call versus the other posted solutions here.
echo preg_match('~^p{Lu}~u', $string) ? 'upper' : 'lower';
我的模式分解:
~ # starting pattern delimiter
^ #match from the start of the input string
p{Lu} #match exactly one uppercase letter (unicode safe)
~ #ending pattern delimiter
u #enable unicode matching
ctype_
和 <时请注意'a'
在这一系列测试中失败了.
Please take notice when ctype_
and < 'a'
fail with this battery of tests.
代码:(演示)
$tests = ['âa', 'Bbbbb', 'Éé', 'iou', 'Δδ'];
foreach ($tests as $test) {
echo "
{$test}:";
echo "
PREG: " , preg_match('~^p{Lu}~u', $test) ? 'upper' : 'lower';
echo "
CTYPE: " , ctype_upper(mb_substr($test, 0, 1)) ? 'upper' : 'lower';
echo "
< a: " , mb_substr($test, 0, 1) < 'a' ? 'upper' : 'lower';
$chr = mb_substr ($test, 0, 1, "UTF-8");
echo "
MB: " , mb_strtoupper($chr, "UTF-8") == $chr ? 'upper' : 'lower';
}
输出:
âa:
PREG: lower
CTYPE: lower
< a: lower
MB: lower
Bbbbb:
PREG: upper
CTYPE: upper
< a: upper
MB: upper
Éé: <-- trouble
PREG: upper
CTYPE: lower <-- uh oh
< a: lower <-- uh oh
MB: upper
iou:
PREG: lower
CTYPE: lower
< a: lower
MB: lower
Δδ: <-- extended beyond question scope
PREG: upper <-- still holding up
CTYPE: lower
< a: lower
MB: upper <-- still holding up
如果有人需要区分大写字母、小写字母和非字母,请参阅这篇文章.
If anyone needs to differentiate between uppercase letters, lowercase letters, and non-letters see this post.
这可能把这个问题的范围扩展得太远了,但如果你输入的字符特别松散(它们可能不存在于Lu
可以处理的类别中),你可能需要检查一下第一个字符有大小写变体:
It may be extending the scope of this question too far, but if your input characters are especially squirrelly (they might not exist in a category that Lu
can handle), you may want to check if the first character has case variants:
p{L&} 或 p{Cased_Letter}:存在大小写变体的字母(Ll、Lu 和 Lt 的组合).
p{L&} or p{Cased_Letter}: a letter that exists in lowercase and uppercase variants (combination of Ll, Lu and Lt).
- 来源:https://www.regular-expressions.info/unicode.html
要包含带有 SMALL
变体的罗马数字(数字字母"),如有必要,您可以将该额外范围添加到模式中.
To include Roman Numerals ("Number Letters") with SMALL
variants, you can add that extra range to the pattern if necessary.
https://www.fileformat.info/info/unicode/category/Nl/list.htm
代码:(演示)
echo preg_match('~^[p{Lu}x{2160}-x{216F}]~u', $test) ? 'upper' : 'not upper';
相关文章