PHP中如何判断字母是大写还是小写?

2021-12-28 00:00:00 string utf-8 php

我也有带有变音符号的 UTF-8 文本,想检查该文本的第一个字母是大写还是小写.如何做到这一点?

I have texts in UTF-8 with diacritic characters also, and would like to check if first letter of this text is upper case or lower case. How to do this?

推荐答案

我认为,与此处发布的其他解决方案相比,进行 preg_ 调用是最直接、简洁和可靠的调用.

It is my opinion that making a preg_ call is the most direct, concise, and reliable call versus the other posted solutions here.

echo preg_match('~^p{Lu}~u', $string) ? 'upper' : 'lower';

我的模式分解:

~      # starting pattern delimiter 
^      #match from the start of the input string
p{Lu} #match exactly one uppercase letter (unicode safe)
~      #ending pattern delimiter 
u      #enable unicode matching

ctype_<时请注意'a' 在这一系列测试中失败了.

Please take notice when ctype_ and < 'a' fail with this battery of tests.

代码:(演示)

$tests = ['âa', 'Bbbbb', 'Éé', 'iou', 'Δδ'];

foreach ($tests as $test) {
    echo "
{$test}:";
    echo "
	PREG:  " , preg_match('~^p{Lu}~u', $test)      ? 'upper' : 'lower';
    echo "
	CTYPE: " , ctype_upper(mb_substr($test, 0, 1))  ? 'upper' : 'lower';
    echo "
	< a:   " , mb_substr($test, 0, 1) < 'a'         ? 'upper' : 'lower';

    $chr = mb_substr ($test, 0, 1, "UTF-8");
    echo "
	MB:    " , mb_strtoupper($chr, "UTF-8") == $chr ? 'upper' : 'lower';
}

输出:

âa:
    PREG:  lower
    CTYPE: lower
    < a:   lower
    MB:    lower
Bbbbb:
    PREG:  upper
    CTYPE: upper
    < a:   upper
    MB:    upper
Éé:               <-- trouble
    PREG:  upper
    CTYPE: lower  <-- uh oh
    < a:   lower  <-- uh oh
    MB:    upper
iou:
    PREG:  lower
    CTYPE: lower
    < a:   lower
    MB:    lower
Δδ:               <-- extended beyond question scope
    PREG:  upper  <-- still holding up
    CTYPE: lower
    < a:   lower
    MB:    upper  <-- still holding up

如果有人需要区分大写字母、小写字母和非字母,请参阅这篇文章.

If anyone needs to differentiate between uppercase letters, lowercase letters, and non-letters see this post.

这可能把这个问题的范围扩展得太远了,但如果你输入的字符特别松散(它们可能不存在于Lu可以处理的类别中),你可能需要检查一下第一个字符有大小写变体:

It may be extending the scope of this question too far, but if your input characters are especially squirrelly (they might not exist in a category that Lu can handle), you may want to check if the first character has case variants:

p{L&} 或 p{Cased_Letter}:存在大小写变体的字母(Ll、Lu 和 Lt 的组合).

p{L&} or p{Cased_Letter}: a letter that exists in lowercase and uppercase variants (combination of Ll, Lu and Lt).

  • 来源:https://www.regular-expressions.info/unicode.html
  • 要包含带有 SMALL 变体的罗马数字(数字字母"),如有必要,您可以将该额外范围添加到模式中.

    To include Roman Numerals ("Number Letters") with SMALL variants, you can add that extra range to the pattern if necessary.

    https://www.fileformat.info/info/unicode/category/Nl/list.htm

    代码:(演示)

    echo preg_match('~^[p{Lu}x{2160}-x{216F}]~u', $test) ? 'upper' : 'not upper';
    

相关文章