从 Oracle Varchar2 中查找和删除非 ASCII 字符

2021-12-06 00:00:00 regex ascii oracle

我们目前正在将我们的一个 oracle 数据库迁移到 UTF8，我们发现了一些接近 4000 字节 varchar 限制的记录.当我们尝试迁移这些记录时，它们会失败，因为它们包含变成多字节 UF8 字符的字符.我想要在 PL/SQL 中做的是找到这些字符以查看它们是什么，然后更改它们或删除它们.

We are currently migrating one of our oracle databases to UTF8 and we have found a few records that are near the 4000 byte varchar limit. When we try and migrate these record they fail as they contain characters that become multibyte UF8 characters. What I want to do within PL/SQL is locate these characters to see what they are and then either change them or remove them.

我想做:

SELECT REGEXP_REPLACE(COLUMN,'[^[:ascii:]],'')

但是 Oracle 没有实现 [:ascii:] 字符类.

but Oracle does not implement the [:ascii:] character class.

有没有一种简单的方法可以做我想做的事情?

Is there a simple way doing what I want to do?

推荐答案

在单字节 ASCII 兼容编码(例如 Latin-1)中，ASCII 字符只是 0 到 127 范围内的字节.所以你可以使用一些东西像 [x80-xFF] 来检测非 ASCII 字符.

In a single-byte ASCII-compatible encoding (e.g. Latin-1), ASCII characters are simply bytes in the range 0 to 127. So you can use something like [x80-xFF] to detect non-ASCII characters.

相关文章