PHP 替换特殊字符,如 à->a、è->e

2021-12-28 00:00:00 utf-8 decode php preg-replace

我有 php 文档 signup.php,它将表单(在 form.php 文档中)的内容保存到 MySQL 基础.当我想重新格式化输入内容时会出现问题.我想解码 UTF-8 字符,如 à->a.

I have php document signup.php which save the content from form (in form.php document) to MySQL base. The problem arises when I want to reformat the input content. I want do decode UTF-8 charachters like à->a.

  $first_name=$_POST['first_name'];
  $last_name=$_POST['last_name'];
  $course=$_POST['course'];

  $chain="prêt-à-porter";

$pattern = array("'é'", "'è'", "'ë'", "'ê'", "'É'", "'È'", "'Ë'", "'Ê'", "'á'", "'à'", "'ä'", "'â'", "'å'", "'Á'", "'À'", "'Ä'", "'Â'", "'Å'", "'ó'", "'ò'", "'ö'", "'ô'", "'Ó'", "'Ò'", "'Ö'", "'Ô'", "'í'", "'ì'", "'ï'", "'î'", "'Í'", "'Ì'", "'Ï'", "'Î'", "'ú'", "'ù'", "'ü'", "'û'", "'Ú'", "'Ù'", "'Ü'", "'Û'", "'ý'", "'ÿ'", "'Ý'", "'ø'", "'Ø'", "'œ'", "'Œ'", "'Æ'", "'ç'", "'Ç'");

$replace = array('e', 'e', 'e', 'e', 'E', 'E', 'E', 'E', 'a', 'a', 'a', 'a', 'a', 'A', 'A', 'A', 'A', 'A', 'o', 'o', 'o', 'o', 'O', 'O', 'O', 'O', 'i', 'i', 'i', 'I', 'I', 'I', 'I', 'I', 'u', 'u', 'u', 'u', 'U', 'U', 'U', 'U', 'y', 'y', 'Y', 'o', 'O', 'a', 'A', 'A', 'c', 'C'); 

$chain = preg_replace($pattern, $replace, $chain);

echo $chain; // print pret-a-porter

$first_name =  preg_replace($pattern, $replace, $first_name);

echo $first_name; // does not change the input!?!

为什么它对 $chain 非常有效,但对 $first_name 或 $last_name 不起作用?

Why it works perfectly for $chain, but for $first_name or $last_name doesnt work?

我也试试

echo $first_name; // print áááááábéééééébšššš
$trans = array("á" => "a", "é" => "e", "š" => "s");
echo strtr("áááááábéééééébšššš", $trans); // print aaaaaabeeeeeebssss
echo strtr($first_name,$trans);  // print áááááábéééééébšššš

但正如你所见,问题是一样的!

but the problem, as you can see, is same!

推荐答案

有一个更简单的方法,使用 iconv - 从用户注释来看,这似乎是你想要做的: 字符音译

There's a much easier way to do this, using iconv - from the user notes, this seems to be what you want to do: characters transliteration

// PHP.net User notes
<?php
    $string = "ʿABBĀSĀBĀD";

    echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $string);
    // output: [nothing, and you get a notice]

    echo iconv('UTF-8', 'ISO-8859-1//IGNORE', $string);
    // output: ABBSBD

    echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT//IGNORE', $string);
    // output: ABBASABAD
    // Yay! That's what I wanted!
?>

要非常认真地处理您的字符编码,以便在流程的所有阶段保持相同的编码 - 前端、表单提交、源文件的编码.PHP 和表单中的默认编码是 ISO-8859-1,在 PHP 5.4 之前它更改为 UTF8(终于!).

Be very conscientious with your character encodings, so you are keeping the same encoding at all stages in the process - front end, form submission, encoding of the source files. Default encoding in PHP and in forms is ISO-8859-1, before PHP 5.4 where it changed to be UTF8 (finally!).

有几个函数可以让您发挥创意.第一个来自 CakePHP 的 inflector 类,叫做 slug:

There's a couple of functions you can play around with for ideas. First is from CakePHP's inflector class, called slug:

public static function slug($string, $replacement = '_') {
    $quotedReplacement = preg_quote($replacement, '/');

    $merge = array(
        '/[^sp{Ll}p{Lm}p{Lo}p{Lt}p{Lu}p{Nd}]/mu' => ' ',
        '/\s+/' => $replacement,
        sprintf('/^[%s]+|[%s]+$/', $quotedReplacement, $quotedReplacement) => '',
    );

    $map = self::$_transliteration + $merge;
    return preg_replace(array_keys($map), array_values($map), $string);
}

这取决于 self::$_transliteration 数组,它与您在问题中所做的类似 - 您可以 查看 github 上的 inflector 源.

It depends on a self::$_transliteration array which is similar to what you were doing in your question - you can see the source for inflector on github.

另一个是我个人使用的一个函数,它来自这里.

Another is a function I use personally, which comes from here.

function slugify($text,$strict = false) {
    $text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
    // replace non letter or digits by -
    $text = preg_replace('~[^\pLd.]+~u', '-', $text);

    // trim
    $text = trim($text, '-');
    setlocale(LC_CTYPE, 'en_GB.utf8');
    // transliterate
    if (function_exists('iconv')) {
        $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
    }

    // lowercase
    $text = strtolower($text);
    // remove unwanted characters
    $text = preg_replace('~[^-w.]+~', '', $text);
    if (empty($text)) {
        return 'empty_$';
    }
    if ($strict) {
        $text = str_replace(".", "_", $text);
    }
    return $text;
}

这些函数的作用是从任意文本音译和创建slugs"输入,这是制作 Web 应用程序时工具箱中非常有用的东西.希望这会有所帮助!

What those functions do is transliterate and create 'slugs' from arbitrary text input, which is a very very useful thing to have in your toolchest when making web apps. Hope this helps!

相关文章