替换 unicode 字符

2021-12-25 00:00:00 unicode replace php

我正在尝试用另一个字符替换字符串中的某个字符.它们是相当晦涩的拉丁字符.我想用 4d9 替换字符(十六进制)259，所以我尝试了这个:

I am trying to replace a certain character in a string with another. They are quite obscure latin characters. I want to replace character (hex) 259 with 4d9, so I tried this:

str_replace("x02x59","x04xd9",$string);

这没有用.我该怎么做?

This didn't work. How do I go about this?

**附加信息.

谢谢 bobince，这已经成功了.虽然，我也想替换大写的 schwa，但由于某种原因它不起作用.我将 U+018F (Ə) 计算为 UTF-8 0xC68F，这将替换为 U+04D8 (0xD398):

Thanks bobince, that has done the trick. Although, I want to replace the uppercase schwa also and it is not working for some reason. I calculated U+018F (Ə) as UTF-8 0xC68F and this is to be replaced with U+04D8 (0xD398):

$string = str_replace("xC9x99", "xD3x99", $_POST['string_with_schwa']); //lc 259->4d9 $string = str_replace( "xC68F", "xD3x98" , $string); //uc 18f->4d8

我正在将Ə"复制到文本框中并发布.第一个 str_replace 在小写上工作正常，但在第二个 str_replace 中没有检测到大写，奇怪.它仍然是 U+018F.我猜我可以通过 strtolower 运行字符串，但这应该可以工作.

I am copying the 'Ə' into a textbox and posting it. The first str_replace works fine on the lowercase, but does not detect the uppercase in the second str_replace, strange. It remains as U+018F. Guess I could run the string through strtolower but this should work though.

推荐答案

U+0259 拉丁小写字母 Schwa 在 UTF-16BE 编码中仅编码为字节序列 0x02,0x59.您不太可能使用 UTF-16BE 编码的字节字符串，因为它不是一种 ASCII 兼容的编码，而且几乎没有人使用它.

U+0259 Latin Small Letter Schwa is only encoded as the byte sequence 0x02,0x59 in the UTF-16BE encoding. It is very unlikely you will be working with byte strings in the UTF-16BE encoding as it's not an ASCII-compatible encoding and almost no-one uses it.

您想要使用的编码(唯一支持拉丁语 Schwa 和 Cyrillic Schwa 的 ASCII 超集编码，因为它支持所有 Unicode 字符)是 UTF-8.确保您的输入是 UTF-8 格式(如果它来自表单数据，则将包含表单的页面作为 UTF-8 提供).然后，在 UTF-8 中，字符 U+0259 使用字节序列 0xC9,0x99 表示.

The encoding you want to be working with (the only ASCII-superset encoding to support both Latin Schwa and Cyrillic Schwa, as it supports all Unicode characters) is UTF-8. Ensure your input is in UTF-8 format (if it is coming from form data, serve the page containing the form as UTF-8). Then, in UTF-8, the character U+0259 is represented using the byte sequence 0xC9,0x99.

str_replace("xC9x99", "xD3x99", $string);

如果您确保在文本编辑器中将 .php 文件保存为 UTF-8-no-BOM，则可以跳过转义直接说:

If you make sure to save your .php file as UTF-8-no-BOM in the text editor, you can skip the escaping and just directly say:

str_replace('ə', 'ә', $string);

相关文章