JavaScript 中的排序:对于比较函数来说，返回布尔值不应该足够吗?

2022-01-29 00:00:00 sorting comparison javascript

我总是像这样成功地对我的数组进行排序(当我不想要标准的字典顺序时):

var arr = […]//一些数字左右arr.sort(函数(a，b){返回一个>乙;});

现在，有人告诉我这是错误的，我需要 return a-b 代替.这是真的吗?如果是，为什么?我已经测试了我的比较功能，它有效！另外，为什么我的解决方案如此普遍如果它是错误的?

解决方案

TL;DR

<块引用>

我总是像这样成功地对我的数组进行排序

不，你没有.并且没有注意到.一个快速的反例:

<代码>>[1,1,0,2].sort(function(a, b){ return a>b })数组 [0, 1, 2, 1]//在 Opera 12 中.结果可能因排序算法实现而异

<块引用>

为什么?

因为您的比较函数确实返回 false(或等效的 0)，即使 b 大于 a.但是 0 意味着这两个元素被认为是相等的——并且排序算法相信这一点.

深入讲解

JavaScript 中的比较函数

<块引用>

比较函数是如何工作的?

Array::sort 方法可以将可选的自定义比较函数作为其参数.该函数需要两个参数(通常称为 a 和 b)，它应该比较它们，并且应该返回一个 number

<代码>>0 当 a 被认为大于 b 并且应该排在它之后
== 0 当 a 被认为等于 b 并且哪个先出现并不重要
<代码><0 当 a 被认为小于 b 并且应该排在它之前

如果它不返回数字，则结果将被转换为数字(这对于布尔值很方便).返回的数字不需要完全是 -1 或 0 或 1(尽管通常是这样).

一致的排序

为了保持一致，比较函数需要满足等式

comp(a, b) == -1 * comp(b, a)//或者，如果考虑 -1、0 和 1 以外的值:补偿(a，b)*补偿(b，a)<= 0

如果该要求被打破，排序将表现为未定义.

引用 ES5.1 规范关于 sort(在 ES6 规范中也是如此):

<块引用>

如果 comparefn 是 [...] 不是该数组元素的一致比较函数，则排序的行为是实现定义的.

如果所有值 a<都满足以下所有要求，则函数 comparefn 是一组值 S 的一致比较函数S 集合中的/code>、b 和 c(可能相同的值):符号 a <CF b 表示 comparefn(a,b) <0;a =CF b 表示 comparefn(a,b) = 0 (任一符号)；a >CF b 表示 comparefn(a,b) >0.

当给定一对特定的值 a 和 b 作为它的两个参数.此外，Type(v) 是 Number，而 v 不是 NaN.请注意，这意味着 a <CF b、a =CF b 和 a >CF b 中的一个对于一对给定的 a 和 b.

调用 comparefn(a,b) 不会修改 this 对象.
a =CF a(reflexivity)
如果 a =CF b，则 b =CF a (对称)
如果 a =CF b 和 b =CF c，则 a =CF c (传递性 of =CF)
如果 a 和 b ，则 a (<代码>)
如果 a >CF b 和 b >CF c，则 a >CF c(<代码>>CF)

注意:上述条件是必要且充分的，以确保 comparefn 将集合 S 划分为等价类，并且这些等价类是完全有序的.

<块引用>呃，这是什么意思?我为什么要关心?
排序算法需要将数组中的项目相互比较.要做好高效的工作，它不一定需要将每个项目相互比较，但需要能够推理它们的顺序.为了使其正常工作，自定义比较函数需要遵守一些规则.一个微不足道的问题是，一个项目 a 等于它自己(compare(a, a) == 0)——这是上面列表中的第一项(自反性).是的，这有点数学，但效果不错.
最重要的是传递性.它表示当算法比较了两个值 a 和 b，以及 b 与 c 时，并且有通过应用比较函数发现，例如a = b 和 b <c，然后可以预期 a <c 也成立.这似乎是合乎逻辑的，并且是定义明确、一致的排序所必需的.
但是您的比较功能确实失败了.让我们看看这个例子:
 function compare(a, b) { return Number(a > b);}compare(0, 2) == 0//啊，2和0相等compare(1, 0) == 1//啊，1大于0//让我们得出结论:1 也大于 2
哎呀.这就是排序算法在使用不一致的比较函数调用时可能失败的原因(在规范中，这是依赖于实现的行为" - 即不可预测的结果).p><块引用>
为什么错误的解决方案如此普遍?
因为在许多其他语言中，存在不期望 三向比较的排序算法 但仅仅是一个布尔小于运算符.C++ std::sort 是一个很好的例子那.如果需要确定相等性，它将简单地使用交换的参数应用两次.诚然，这可能更有效且不易出错，但如果无法内联运算符，则需要更多调用比较函数.
反例
<块引用>我已经测试了我的比较功能，它有效！
如果您尝试了一些随机示例，则只能靠运气.或者因为您的测试套件有缺陷 - 不正确和/或不完整.
这是我用来查找上述最小反例的小脚本:
函数 perms(n, i, arr, cb) {//使用所有可能的长度为 n 的数组调用回调如果 (i >= n) 返回 cb(arr);for (var j=0; j
什么比较函数是正确的?
当您需要字典排序时，根本不使用比较函数.如有必要，数组中的项目将被字符串化.
类似于关系运算符的通用比较函数可以实现为
函数(a, b) {如果 (a > b) 返回 1；如果 (a < b) 返回 -1；/* 否则 */返回 0;}
通过一些技巧，可以将其缩小为等效的 function(a,b){return +(a>b)||-(a<b)}.
对于数字，您可以简单地返回它们的差值，这符合上述所有法律:
函数(a, b) {返回 a - b;//但确保只传递数字(以避免 NaN)}
如果要反向排序，只需取适当的一个并将 a 与 b 交换即可.
如果您想对复合类型(对象等)进行排序，请将每个 a 和每个 b 替换为对相关属性的访问，或方法调用或其他你想排序.
I have always successfully sorted my arrays like this (when I did not want the standard lexicographic ordering):
var arr = […] // some numbers or so
arr.sort(function(a, b) {
    return a > b;
});
Now, someone told me this was wrong, and that I would need to return a-b instead. Is that true, and if yes why? I have tested my comparison function, and it works! Also, why would my solution be so common when it is wrong?
 解决方案 
TL;DR


  I have always successfully sorted my arrays like this
No, you have not. And didn't notice it. A quick counterexample:
> [1,1,0,2].sort(function(a, b){ return a>b })
Array [0, 1, 2, 1]
// in Opera 12. Results may vary between sorting algorithm implementations



  why?
Because your comparison function does return false (or 0, equivalently) even when b is larger than a. But 0 implies that the two elements are considered equal - and the sorting algorithm believes that.

In-Depth Explanation

Comparison functions in JavaScript


  How do comparison functions work?
The Array::sort method can take an optional, custom comparison function as its argument. That function takes two arguments (commonly referred to as a and b) which it should compare, and is supposed to return a number


> 0 when a is considered larger than b and should be sorted after it
== 0 when a is considered equal to b and it doesn't matter which comes first
< 0 when a is considered smaller than b and should be sorted before it


If it does not return a number, the result will be cast to a number (which is handy for booleans). The returned number does not need to be exactly -1 or 0 or 1 (though it typically is).

Consistent ordering

To be consistent, the compare function would need to fulfill the equation
comp(a, b) == -1 * comp(b, a)
// or, if values other than -1, 0 and 1 are considered:
comp(a, b) * comp(b, a) <= 0
If that requirement is broken, the sort will behave undefined.

Citing the ES5.1 spec on sort (same thing in the ES6 spec):

  If comparefn is […] not a consistent comparison function for the elements of this array, the behaviour of sort is implementation-defined.
  
  A function comparefn is a consistent comparison function for a set of values S if all of the requirements below are met for all values a, b, and c (possibly the same value) in the set S: The notation a <CF b means comparefn(a,b) < 0; a =CF b means comparefn(a,b) = 0 (of either sign); and a >CF b means comparefn(a,b) > 0.
  
  Calling comparefn(a,b) always returns the same value v when given a specific pair of values a and b as its two arguments. Furthermore, Type(v) is Number, and v is not NaN. Note that this implies that exactly one of a <CF b, a =CF b, and a >CF b will be true for a given pair of a and b.
  
  
  Calling comparefn(a,b) does not modify the this object.
  a =CF a (reflexivity)
  If a =CF b, then b =CF a (symmetry)
  If a =CF b and b =CF c, then a =CF c (transitivity of =CF)
  If a <CF b and b <CF c, then a <CF c (transitivity of <CF)
  If a >CF b and b >CF c, then a >CF c (transitivity of >CF)
  
  
  NOTE: The above conditions are necessary and sufficient to ensure that comparefn divides the set S into equivalence classes and that these equivalence classes are totally ordered.

  Uh, what does this mean? Why should I care?
A sorting algorithm needs to compare items of the array with each other. To do a good and efficient job, it must not need to compare each item to every other, but needs to be able to reason about their ordering. For that to work well, there are a few rules that a custom comparison function needs to abide. A trivial one is that an item a is equal to itself (compare(a, a) == 0) - that's the first item in the list above (reflexivity). Yes, this is a bit mathematical, but pays of well.

The most important one is transitivity. It says that when the algorithm has compared two values a and b, and also b with c, and has found out by applying the comparison function that e.g. a = b and b < c, then it can expect that a < c holds as well. This seems only logical, and is required for a well-defined, consistent ordering.

But your comparison function does fail this. Lets look at this example:
 function compare(a, b) { return Number(a > b); }
 compare(0, 2) == 0 // ah, 2 and 0 are equal
 compare(1, 0) == 1 // ah, 1 is larger than 0
 // let's conclude: 1 is also larger than 2
Ooops. And that is why a sorting algorithm can fail (in the spec, this is be "implementation-dependent behaviour" - i.e. unpredictable results) when it is invoked with a comparison function that is not consistent.

  Why is the wrong solution so common?
Because in many other languages, there are sorting algorithms that don't expect a three-way comparison but merely a boolean smaller-than operator. C++ std::sort is a good example of that. It will simply be applied twice with swapped arguments if an equality needs to be determined. Admittedly, this can be more efficient and is less error-prone, but needs more calls to the comparison function if the operator cannot be inlined.

CounterExamples


  I have tested my comparison function, and it works!
Only by sheer luck, if you tried some random example. Or because your test suite is flawed - incorrect and/or incomplete.

Here is the small script I used to find the above minimal counterexample:
function perms(n, i, arr, cb) {
// calls callback with all possible arrays of length n
    if (i >= n) return cb(arr);
    for (var j=0; j<n; j++) {
        arr[i] = j;
        perms(n, i+1, arr, cb);
    }
}
for (var i=2; ; i++) // infinite loop
    perms(i, 0, [], function(a) {
        if (    a.slice().sort(function(a,b){ return a>b }).toString()
             != a.slice().sort(function(a,b){ return a-b }).toString() )
            // you can also console.log() all of them, but remove the loop!
            throw a.toString();
    });


What comparison function is correct?

Use no comparison function at all, when you want a lexicographic sort. Items in the array will be stringified if necessary.

A generic comparison function that works like the relational operators can be implemented as
function(a, b) {
    if (a > b) return 1;
    if (a < b) return -1;
    /* else */ return 0;
}
With a few tricks, this can be minified to the equivalent function(a,b){return +(a>b)||-(a<b)}.

For numbers, you can simply return their difference, which abides all the laws above:
function(a, b) {
    return a - b; // but make sure only numbers are passed (to avoid NaN)
}
If you want to sort in reverse, just take the appropriate one and swap a with b.

If you want to sort composite types (objects etc), replace each a and each b with an access of the properties in question, or a method call or whatever you want to sort by.


	
		相关文章