顺序无关的哈希算法
我目前正在为我的自定义编程语言开发一个集合库.我已经有几种数据类型(Collection、List、Map、Set)和它们的实现(可变和不可变),但到目前为止我缺少的是 hashCode
和 equals
.虽然这些对于 Lists 来说没有问题,因为它们是有序集合,但对于 Sets 和 Maps 起着特殊的作用.如果两个 Set 具有相同的大小和相同的元素,则认为它们相等,并且 Set 维护它们的顺序不应影响它们的相等性.由于 equals-hashCode-contract,hashCode
实现也必须反映这种行为,这意味着具有相同元素但顺序不同的两个集合应该具有相同的哈希码.(这同样适用于 Maps,它在技术上是一组键值对)
I am currently working on a collection library for my custom programming language. I already have several data types (Collection, List, Map, Set) and implementations for them (mutable and immutable), but what I was missing so far was hashCode
and equals
. While these are no problem for Lists as they are ordered collections, the play a special role for Sets and Maps. Two Sets are considered equal if they have the same size and the same elements, and the order in which the Sets maintain them should not make a difference in their equality. Because of the equals-hashCode-contract, the hashCode
implementation also has to reflect this behavior, meaning that two sets with the same elements but different ordering should have the same hash code. (The same applies for Maps, which are technically a Set of Key-Value-Pairs)
示例(伪代码):
let set1: Set<String> = [ "a", "b", "c" ]
let set2: Set<String> = [ "b", "c", "a" ]
set1 == set2 // should return true
set1.hashCode == set2.hashCode // should also return true
我将如何实现一个相当好的哈希算法,让上面示例中的 hashCode
s 返回相同的值?
How would I implement a reasonably good hash algorithm for which the hashCode
s in the above example return the same value?
推荐答案
JDK本身针对这个问题提出了如下解决方案.java.util.Set 的合约 接口状态:
The JDK itself proposes the following solution to this problem. The contract of the java.util.Set interface states:
返回此集合的哈希码值.集合的哈希码定义为集合中元素的哈希码之和,其中空元素的哈希码定义为零.这确保了 s1.equals(s2) 意味着任何两个集合 s1 和 s2 的 s1.hashCode()==s2.hashCode(),这是 Object.hashCode() 的一般合同所要求的.
Returns the hash code value for this set. The hash code of a set is defined to be the sum of the hash codes of the elements in the set, where the hash code of a null element is defined to be zero. This ensures that s1.equals(s2) implies that s1.hashCode()==s2.hashCode() for any two sets s1 and s2, as required by the general contract of Object.hashCode().
使用条目哈希码总和的替代方法是使用 ^
(XOR) 运算符.
An alternative to using the sum of the entries' hash codes would be to use, for example, the ^
(XOR) operator.
Scala 语言使用 Murmurhash 算法的排序不变版本(参见私有scala.util.hashing.MurmurHash3
类)来实现其 不可变集合和类似集合.
The Scala language uses an ordering-invariant version of the Murmurhash algorithm (cf. the private scala.util.hashing.MurmurHash3
class) to implement the hashCode
(or ##
) method of its immutable sets and similar collections.
相关文章