PHP 中的 htmlentities 但保留 html 标签

2021-12-25 00:00:00 string replace php html html-entities

我想将字符串中的所有文本转换为 html 实体,但保留 HTML 标签,例如:

<p><font style="color:#FF0000">Camión español</font></p>

应该翻译成这样:

<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>

有什么想法吗?

解决方案

您可以获取实体noreferrer">htmlentities,带有函数 get_html_translation_table ;考虑这个代码:

$list = get_html_translation_table(HTML_ENTITIES);var_dump($list);

(您可能需要检查手册中该函数的第二个参数——也许您需要将其设置为与默认值不同的值)

它会给你这样的东西:

数组' ' =>字符串 '&nbsp;'(长度=6)'¡' =>字符串 '&ieexcl;'(长度=7)'¢' =>字符串 '&cent;'(长度=6)'£' =>字符串'&磅;'(长度=7)'¤' =>字符串 '&curren;'(长度=8)............'ÿ' =>字符串 '&yuml;'(长度=6)'"' => 字符串 '&quot;'(长度=6)'<'=>字符串 '&lt;'(长度=4)'>'=>字符串 '&gt;'(长度=4)'&'=>字符串 '&amp;'(长度=5)

现在,删除您不想要的对应关系:

unset($list['"']);取消设置($list['<']);未设置($list['>']);取消设置($list['&']);

您的列表现在包含 htmlentites 使用的所有对应字符 => 实体,除了您不想编码的少数字符.

现在,您只需要提取键和值的列表:

$search = array_keys($list);$values = array_values($list);

最后,您可以使用 str_replace 进行替换:

$str_in = '<p><font style="color:#FF0000">Camión español</font></p>';$str_out = str_replace($search, $values, $str_in);var_dump($str_out);

你得到:

string '<p><font style="color:#FF0000">Cami&Atilde;&sup3;n espa&Atilde;&plusmn;ol</font></p>'(长度=84)

这看起来像你想要的 ;-)


嗯,除了编码问题(该死的 UTF-8,我想 - 我正在尝试为此找到解决方案,并将再次编辑)

几分钟后的第二次在调用 str_replace 之前,您似乎必须在 $search 列表中使用 utf8_encode :-(

这意味着使用这样的东西:

$search = array_map('utf8_encode', $search);

在调用 array_keys 和调用 str_replace 之间.

而且,这一次,你真的应该得到你想要的:

string '<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>'(长度=70)


这是代码的完整部分:

$list = get_html_translation_table(HTML_ENTITIES);未设置($list['"']);取消设置($list['<']);未设置($list['>']);取消设置($list['&']);$search = array_keys($list);$values = array_values($list);$search = array_map('utf8_encode', $search);$str_in = '<p><font style="color:#FF0000">Camión español</font></p>';$str_out = str_replace($search, $values, $str_in);var_dump($str_in, $str_out);

以及完整的输出:

string '<p><font style="color:#FF0000">Camión español</font></p>'(长度=58)字符串 '<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>'(长度=70)

这次应该可以了^^
它并不适合在一行中,可能不是最优化的解决方案;但它应该可以正常工作,并且具有允许您添加/删除任何对应字符 => 您需要或不需要的实体的优点.

玩得开心!

I want to convert all texts in a string into html entities but preserving the HTML tags, for example this:

<p><font style="color:#FF0000">Camión español</font></p>

should be translated into this:

<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>

any ideas?

解决方案

You can get the list of correspondances character => entity used by htmlentities, with the function get_html_translation_table ; consider this code :

$list = get_html_translation_table(HTML_ENTITIES);
var_dump($list);

(You might want to check the second parameter to that function in the manual -- maybe you'll need to set it to a value different than the default one)

It will get you something like this :

array
  ' ' => string '&nbsp;' (length=6)
  '¡' => string '&iexcl;' (length=7)
  '¢' => string '&cent;' (length=6)
  '£' => string '&pound;' (length=7)
  '¤' => string '&curren;' (length=8)
  ....
  ....
  ....
  'ÿ' => string '&yuml;' (length=6)
  '"' => string '&quot;' (length=6)
  '<' => string '&lt;' (length=4)
  '>' => string '&gt;' (length=4)
  '&' => string '&amp;' (length=5)

Now, remove the correspondances you don't want :

unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);

Your list, now, has all the correspondances character => entity used by htmlentites, except the few characters you don't want to encode.

And now, you just have to extract the list of keys and values :

$search = array_keys($list);
$values = array_values($list);

And, finally, you can use str_replace to do the replacement :

$str_in = '<p><font style="color:#FF0000">Camión español</font></p>';
$str_out = str_replace($search, $values, $str_in);
var_dump($str_out);

And you get :

string '<p><font style="color:#FF0000">Cami&Atilde;&sup3;n espa&Atilde;&plusmn;ol</font></p>' (length=84)

Which looks like what you wanted ;-)


Edit : well, except for the encoding problem (damn UTF-8, I suppose -- I'm trying to find a solution for that, and will edit again)

Second edit couple of minutes after : it seem you'll have to use utf8_encode on the $search list, before calling str_replace :-(

Which means using something like this :

$search = array_map('utf8_encode', $search);

Between the call to array_keys and the call to str_replace.

And, this time, you should really get what you wanted :

string '<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>' (length=70)


And here is the full portion of code :

$list = get_html_translation_table(HTML_ENTITIES);
unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);

$search = array_keys($list);
$values = array_values($list);
$search = array_map('utf8_encode', $search);

$str_in = '<p><font style="color:#FF0000">Camión español</font></p>';
$str_out = str_replace($search, $values, $str_in);
var_dump($str_in, $str_out);

And the full output :

string '<p><font style="color:#FF0000">Camión español</font></p>' (length=58)
string '<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>' (length=70)

This time, it should be ok ^^
It doesn't really fit in one line, is might not be the most optimized solution ; but it should work fine, and has the advantage of allowing you to add/remove any correspondance character => entity you need or not.

Have fun !

相关文章