在 JavaScript 中使用 toLowerCase 或 toUpperCase 比较字符串更好吗?

2022-01-18 00:00:00 internationalization javascript string-comparison

我正在进行代码审查，我很好奇在尝试在忽略大小写的情况下比较字符串时，在 JavaScript 中将字符串转换为大写还是小写更好.

I'm going through a code review and I'm curious if it's better to convert strings to upper or lower case in JavaScript when attempting to compare them while ignoring case.

简单的例子:

var firstString = "I might be A different CASE"; var secondString = "i might be a different case"; var areStringsEqual = firstString.toLowerCase() === secondString.toLowerCase();

或者我应该这样做:

var firstString = "I might be A different CASE"; var secondString = "i might be a different case"; var areStringsEqual = firstString.toUpperCase() === secondString.toUpperCase();

这似乎是应该".或者只能使用有限的字符集，比如只有英文字母，那么一个比另一个更健壮?

It seems like either "should" or would work with limited character sets like only English letters, so is one more robust than the other?

作为说明，MSDN 建议将字符串规范化为大写，但这是针对托管代码(可能是 C# 和 F#，但它们有花哨的 StringComparers 和基本库):

As a note, MSDN recommends normalizing strings to uppercase, but that is for managed code (presumably C# & F# but they have fancy StringComparers and base libraries):

http://msdn.microsoft.com/en-us/library/bb386042.aspx

推荐答案

修改答案
我回答这个问题已经有一段时间了.虽然文化问题仍然存在(而且我认为它们永远不会消失)，但 ECMA-402 标准使我原来的答案...过时(或过时?).

Revised answer

It's been quite a while when I answered this question. While cultural issues still holds true (and I don't think they will ever go away), the development of ECMA-402 standard made my original answer... outdated (or obsolete?).

比较本地化字符串的最佳解决方案似乎是使用函数 localeCompare() 带有适当的语言环境和选项:

The best solution for comparing localized strings seems to be using function localeCompare() with appropriate locales and options:

var locale = 'en'; // that should be somehow detected and passed on to JS var firstString = "I might be A different CASE"; var secondString = "i might be a different case"; if (firstString.localeCompare(secondString, locale, {sensitivity: 'accent'}) === 0) { // do something when equal }

这将比较两个不区分大小写但区分重音的字符串(例如 ± != a). 如果出于性能原因这还不够，您可能需要使用toLocaleUpperCase()或toLocaleLowerCase()`将语言环境作为参数传递:

This will compare two strings case-insensitive, but accent-sensitive (for example ą != a). If this is not sufficient for performance reasons, you may want to use eithertoLocaleUpperCase()ortoLocaleLowerCase()` passing the locale as a parameter:

if (firstString.toLocaleUpperCase(locale) === secondString.toLocaleUpperCase(locale)) { // do something when equal }

理论上应该没有区别.在实践中，细微的实现细节(或在给定浏览器中缺乏实现)可能会产生不同的结果......

In theory there should be no differences. In practice, subtle implementation details (or lack of implementation in the given browser) may yield different results...

我不确定您是否真的打算在 Internationalization (i18n) 标记中提出这个问题，但既然您这样做了...
最出乎意料的答案可能是:都不是.

I am not sure if you really meant to ask this question in Internationalization (i18n) tag, but since you did...
Probably the most unexpected answer is: neither.

大小写转换存在成吨的问题，这不可避免地会导致功能问题，如果您想在不指明语言的情况下转换字符大小写(如 JavaScript 大小写).例如:

There are tons of problems with case conversion, which inevitably leads to functional issues if you want to convert the character case without indicating the language (like in JavaScript case). For instance:

有许多自然语言没有大小写字符的概念.尝试转换它们没有意义(尽管这会起作用).
有用于转换字符串的特定语言规则.德语尖 S 字符 (ß) 必然会转换成两个大写的 S 字母(SS).
土耳其语和阿塞拜疆语(或阿塞拜疆语，如果你愿意的话)有非常奇怪"两个 i 字符的概念:无点 ı(转换为大写 I)和带点 i(转换为大写 İ <- 此字体无法正确显示，但这确实是不同的字形).
希腊语有很多奇怪"的转换规则.一个关于大写字母 sigma (Σ) 的特定规则取决于单词中的位置有两个小写对应:正则 sigma (σ) 和最终 sigma (ς).还有其他关于重音"字符的转换规则，但在转换函数的实现过程中通常会被省略.
某些语言有标题大小写字母，即ǈ应该转换为事物像Ǉ 或不太恰当地 LJ.连字也是如此.
最后，有许多兼容性字符可能与您要比较的意思相同，但由完全不同的字符组成.更糟糕的是，ae"之类的东西可能相当于德语和芬兰语中的ä"，但相当于丹麦语中的æ".

There are many natural languages that don't have concept of upper- and lowercase characters. No point in trying to convert them (although this will work).

There are language specific rules for converting the string. German sharp S character (ß) is bound to be converted into two upper case S letters (SS).

Turkish and Azerbaijani (or Azeri if you prefer) has "very strange" concept of two i characters: dotless ı (which converts to uppercase I) and dotted i (which converts to uppercase İ <- this font does not allow for correct presentation, but this is really different glyph).

Greek language has many "strange" conversion rules. One particular rule regards to uppercase letter sigma (Σ) which depending on a place in a word has two lowercase counterparts: regular sigma (σ) and final sigma (ς). There are also other conversion rules in regard to "accented" characters, but they are commonly omitted during implementation of conversion function.

Some languages has title-case letters, i.e. ǈ which should be converted to things like Ǉ or less appropriately LJ. The same may regard to ligatures.

Finally there are many compatibility characters that may mean the same as what you are trying to compare to, but be composed of completely different characters. To make it worse, things like "ae" may be the equivalent of "ä" in German and Finnish, but equivalent of "æ" in Danish.

我试图让您相信，从字面上比较用户输入而不是转换它确实更好.如果不是用户相关的，可能没关系，但大小写转换总是需要时间.何必呢?

I am trying to convince you that it is really better to compare user input literally, rather than converting it. If it is not user-related, it probably doesn't matter, but case conversion will always take time. Why bother?

相关文章