UTF-8 的多字节安全 wordwrap() 函数

2021-12-28 00:00:00 string utf-8 word-wrap php multibyte

PHP 的 wordwrap() 函数对于多像UTF-8这样的字节串.

PHP's wordwrap() function doesn't work correctly for multi-byte strings like UTF-8.

评论里有几个mb安全函数的例子,但是有了一些不同的测试数据,它们似乎都有一些问题.

There are a few examples of mb safe functions in the comments, but with some different test data they all seem to have some problems.

该函数应采用与 wordwrap() 完全相同的参数.

The function should take the exact same parameters as wordwrap().

特别要确保它适用于:

  • 如果$cut = true,则剪切中间词,否则不剪切中间词
  • 如果 $break = ' '
  • ,不要在单词中插入额外的空格
  • 也适用于 $break = " "
  • 适用于 ASCII 和所有有效的 UTF-8
  • cut mid-word if $cut = true, don't cut mid-word otherwise
  • not insert extra spaces in words if $break = ' '
  • also work for $break = " "
  • work for ASCII, and all valid UTF-8

推荐答案

这个好像很好用...

function mb_wordwrap($str, $width = 75, $break = "
", $cut = false, $charset = null) {
    if ($charset === null) $charset = mb_internal_encoding();

    $pieces = explode($break, $str);
    $result = array();
    foreach ($pieces as $piece) {
      $current = $piece;
      while ($cut && mb_strlen($current) > $width) {
        $result[] = mb_substr($current, 0, $width, $charset);
        $current = mb_substr($current, $width, 2048, $charset);
      }
      $result[] = $current;
    }
    return implode($break, $result);
}

相关文章