使用 PHP 正则表达式匹配字符串中的任何 Unicode 空白字符

2021-12-28 00:00:00 string regex split php


I want to split text message into array at every Space. It's been working just fine until I received this text message. Here is the few code lines that process the text string:

    $str = 'T bw4  05/09/19 07:51 am BW6N 499.803';
    $cleanStr = iconv("UTF-8", "ISO-8859-1", $str);
    $strArr = preg_split('/[s	]/', $cleanStr);

Var_dump 产生这个结果:

Var_dump yields this result:

array:6 [▼
 0 => "T"
 1 => b"bw4  05/09/19"
 2 => "07:51"
 3 => "am"
 4 => "BW6N"
 5 => "499.803"

数组 "1 => b"bw4 05/09/19"" 中的 #1 项不正确,我无法弄清楚数组值前面的字母 "b" 是什么.此外,bw4"和05/09/19"之间的空格非常感谢有关如何更好地实现字符串拆分的任何建议.这是原始字符串:https://3v4l.org/2L35M,这是我的结果图像本地主机:http://prntscr.com/jjbvny

The #1 item in the array "1 => b"bw4 05/09/19"" in not correct, I am not able figure out what is the letter "b" in front of the array value. Also, the space(es) between "bw4" and "05/09/19" Any suggestion on how better achieve the string splitting are greatly appreciated. Here is the original string: https://3v4l.org/2L35M and here is the image of result from my localhost: http://prntscr.com/jjbvny


要匹配您可能使用的任何 1 个或多个 Unicode 空白字符

To match any 1 or more Unicode whitespace chars you may use


您的 '/[s ]/' 模式仅匹配单个空白字符 (s) 或制表符 ( )(这当然是多余的,因为 s 也已经匹配制表符了),但是由于缺少 u 修饰符,s 无法匹配 bw4 之后的 u00A0 字符(硬空格).

Your '/[s ]/' pattern only matches a single whitespace char (s) or a tab ( ) (which is of course redundant as s already matches tabs, too), but since the u modifier is missing, the s cannot match the u00A0 chars (hard spaces) you have after bw4.


$str = 'T bw4  05/09/19 07:51 am BW6N 499.803';
$strArr = preg_split('/s+/u', $str);

查看 PHP 演示 产出

    [0] => T
    [1] => bw4
    [2] => 05/09/19
    [3] => 07:51
    [4] => am
    [5] => BW6N
    [6] => 499.803
