在 ColdFusion 中规范化字符串

2022-01-09 00:00:00 class string normalization coldfusion java

我正在尝试在 ColdFusion 中规范化字符串.

I'm trying to normalize a string in ColdFusion.

我想为此使用Java类java.text.Normalizer,因为据我所知CF没有任何类似的功能.

I want to use the Java class java.text.Normalizer for this, as CF doesn't have any similar functions as far as I know.

这是我当前的代码:

<cfset normalizer = createObject( "java", "java.text.Normalizer" ) />
<cfset string = "äéöè" />
<cfset string = normalizer.normalize(string, createObject( "java", "java.text.Normalizer$Form" ).NFD) />
<cfset string = ReReplace(string, "\p{InCombiningDiacriticalMarks}+", "") />
<cfoutput>#string#</cfoutput>

任何想法为什么它总是输出 äéöè 而不是规范化字符串?

Any ideas why it always outputs äéöè and not a normalized string?

推荐答案

在 ColdFusion 中,与 Java 不同,您不需要在字符串文字中转义反斜杠.您当前的正则表达式不会匹配不以反斜杠开头的任何内容,因此不会发生替换.

In ColdFusion, unlike in Java, you don't need to escape backslashes in string literals. Your current regex will not match anything that does not start with a backslash, so no replacement happens.

除此之外,您的代码完全正确,您可以看到输出时字符串的长度是 8,而不是 4.这是 normalize 调用的效果.

Other than that, your code is perfectly correct and you can see that the length of the string is 8, not 4, at the time of the output. This is an effect of the normalize call.

但是,请记住,它仍然是原始字符串的等效表示,因此您无法从视觉上区分差异也就不足为奇了.这是正确的 Unicode 渲染.

However, remember that it is still an equivalent representation of the original string, and so it is not surprising that you cannot tell the difference visually. This is correct Unicode rendering in action.

相关文章