如果我们只覆盖类中的 hashCode() 并在 Set 中使用它会发生什么?

2022-01-17 00:00:00 set collections java

这可能不是真实世界的场景,只是想知道会发生什么,下面是代码.

This may not be the real world scenario but just curious to know what happens, below is the code.

我正在创建一组 UsingSet 类的对象.根据Java中的哈希概念,当我第一次添加包含a"的对象时,它将创建一个哈希码为97的桶并将对象放入其中.再次,当它遇到一个带有a"的对象时,它会调用类 UsingSet 中重写的 hashcode 方法,它会得到 hashcode 97 那么接下来是什么?

I am creating a set of object of class UsingSet. According to hashing concept in Java, when I first add object which contains "a", it will create a bucket with hashcode 97 and put the object inside it. Again when it encounters an object with "a", it will call the overridden hashcode method in the class UsingSet and it will get hashcode 97 so what is next?

由于我没有重写 equals 方法,默认实现将返回 false.那么值a"的对象将保存在哪个桶中,与之前的哈希码为 97 的对象保存在同一个桶中?还是会创建新的存储桶?有人知道它将如何在内部存储吗?

As I have not overridden equals method, the default implementation will return false. So where will be the Object with value "a" be kept, in the same bucket where the previous object with hashcode 97 kept? or will it create new bucket? anybody know how it will be stored internally?

/* package whatever; // don't place package name! */

import java.util.*;
import java.lang.*;
import java.io.*;

class UsingSet {  

  String value;  

  public UsingSet(String value){  
    this.value = value;  
  }  

  public String toString() {  
    return value;  
  }  

  public int hashCode() {  
    int hash = value.hashCode();  
    System.out.println("hashcode called" + hash);  
    return hash;  
  }  

  public static void main(String args[]) {  

    java.util.Set s = new java.util.HashSet();  

    s.add(new UsingSet("A"));  
    s.add(new UsingSet("b"));  
    s.add(new UsingSet("a"));  
    s.add(new UsingSet("b"));   
    s.add(new UsingSet("a"));  

    s.add(new Integer(1));  
    s.add(new Integer(1));  

    System.out.println("s = " + s); 

  }  
}  

输出是:

hashcode called65
hashcode called98
hashcode called97
hashcode called98
hashcode called97
s = [1, b, b, A, a, a]

推荐答案

James Large 的答案不正确,或者说具有误导性(部分也不正确).我会解释的.

James Large answer is incorrect, or rather misleading (and part incorrect as well). I will explain.

如果两个对象根据它们的equals()方法相等,那么它们也必须具有相同的哈希码.如果两个对象具有相同的哈希码,它们也不必相等.

If two objects are equal according to their equals() method, they must also have the same hash code. If two objects have the same hash code, they do NOT have to be equal too.

以下是 java.util.Object 文档中的实际措辞:

Here is the actual wording from the java.util.Object documentation:

  • 如果根据 equals(Object) 方法,两个对象相等,则对两个对象中的每一个调用 hashCode 方法必须产生相同的整数结果.
  • 不要求如果两个对象根据equals(java.lang.Object)方法不相等,那么对两个对象中的每一个调用hashCode方法必须产生不同的整数结果.但是,程序员应该意识到,为不相等的对象生成不同的整数结果可能会提高哈希表的性能.

确实,如果两个对象没有相同的哈希值,那么它们就不相等.但是,散列不是检查相等性的方法 - 因此说它是检查相等性的更快方法是非常不正确的.

It is true, that if two objects don't have the same hash then they are not equal. However, hashing is not a way to check equality - so it is wildly incorrect to say that it is a faster way to check equality.

此外,说 hashCode 函数是做任何事情的有效方法也是非常不正确的.这完全取决于实现,但是字符串的 hashCode 的默认实现非常低效,因为字符串变大了.它将根据字符串的每个字符执行计算,因此如果您使用大字符串作为键,那么这将变得非常低效;如果您有大量存储桶,则更是如此.

Also, it is also wildly incorrect to say the hashCode function is an efficient way to do anything. This is all up to implementation, but the default implementation for hashCode of a string is very inefficient as the String gets large. It will perform a calculation based on each char of the String, so if you are using large Strings as keys, then this becomes very inefficient; moreso if you have a large number of buckets.

在一个 Map 中(HashSet 内部使用了一个 HashMap),有桶,每个桶中有一个链表.Java 使用 hashCode() 函数来找出它属于哪个桶(它实际上会修改哈希,取决于存在多少桶).由于两个对象可能共享相同的哈希,接下来会依次遍历链表,检查 equals() 方法以查看对象是否重复.根据 java.util.Set 文档:

In a Map (HashSet uses a HashMap internally), there are buckets and in each bucket is a linked list. Java uses the hashCode() function to find out which bucket it belongs in (it actually will modify the hash, depending on how many buckets exist). Since two objects may share the same hash, it will iterate through the linked list sequentially next, checking the equals() method to see if the object is a duplicate. Per the java.util.Set documenation:

一个不包含重复元素的集合.

A collection that contains no duplicate elements.

因此,如果它的 hashCode() 将其引导到一个存储桶,其中该存储桶包含一个 .equals() 计算结果为 true 的 Object,那么之前的 Object 将被新的 Object 覆盖.您可能可以在此处查看更多信息:Java HashMap 如何处理具有相同哈希码的不同对象?

So, if its hashCode() leads it to a bucket, in which that bucket contains an Object where the .equals() evaluates to true, then the previous Object is overwritten with the new Object. You can probably view here for more information: How does a Java HashMap handle different objects with the same hash code?

一般来说,如果你覆盖了 hashCode 函数,你也覆盖了 equals 函数(如果我没记错的话,如果你选择不这样做,这会违反合同),这是一个很好的做法.

Generally speaking though, it is good practice that if you overwrite the hashCode function, you also overwrite the equals function (if I'm not mistaken, this breaks the contract if you choose not to).

相关文章