由于任期空间的压缩,.hashcode() 会返回不同的 int 吗?

2022-01-16 00:00:00 garbage-collection java

如果我在某个对象上调用 Object.hashcode() 方法,它会返回该对象的内部地址(默认实现).这个地址是逻辑地址还是物理地址?

If I call the Object.hashcode() method on some object it returns the internal address of the object (default implementation). Is this address a logical or physical address?

在垃圾回收中,由于内存压缩,对象在内存中发生移动.如果我在 GC 之前和之后调用 hashcode,它会返回相同的 hashcode(它返回)吗?如果是,那么为什么(因为压缩地址可能会改变)?

In garbage collection, due to memory compaction objects shifting takes place in the memory. If I call hashcode before and after the GC, will it return the same hashcode (it returns) and if yes then why (because of compaction address may change) ?

推荐答案

@erickson 或多或少是正确的.java.lang.Object.hashCode() 返回的哈希码在对象的生命周期内不会改变.

@erickson is more or less correct. The hashcode returned by java.lang.Object.hashCode() does not change for the lifetime of the object.

这种(通常)实现的方式相当聪明.当一个对象被垃圾收集器重新定位时,它的原始哈希码必须存储在某个地方,以防再次使用它.实现这一点的明显方法是在对象头中添加一个 32 位字段来保存哈希码.但这会为每个对象增加 1 个字的开销,并且在最常见的情况下会浪费空间......在不调用对象的 hashCode 方法的情况下.

The way this is (typically) implemented is rather clever. When an object is relocated by the garbage collector, its original hashcode has to be stored somewhere in case it is used again. The obvious way to implement this would be to add a 32 bit field to the object header to hold the hashcode. But that would add a 1 word overhead to every object, and would waste space in the most common case ... where an Object's hashCode method is not called.

解决方案是在对象的标志字中添加两个标志位,并(大致)如下使用它们.第一个标志是在调用 hashCode 方法时设置的.第二个标志告诉 hashCode 方法是使用对象的当前地址作为哈希码,还是使用存储的值.当 GC 运行并重定位对象时,它会测试这些标志.如果设置了第一个标志而未设置第二个标志,则 GC 在对象末尾分配一个额外的字并将原始对象位置存储在该字中.然后它设置两个标志.从那时起,hashCode方法从对象末尾的单词中获取hashcode值.

The solution is to add two flag bits to the object's flag word, and use them (roughly) as follows. The first flag is set when the hashCode method is called. A second flag tells the hashCode method whether to use the object's current address as the hashcode, or to use a stored value. When the GC runs and relocates an object, it tests these flags. If the first flag is set and second one is unset, the GC allocates one extra word at the end of the object and stores the original object location in that word. Then it sets the two flags. From then on, the hashCode method gets the hashcode value from the word at the end of the object.

事实上,identityHashCode 实现必须这样做才能满足 通用hashCode合约:

In fact, an identityHashCode implementation has to behave this way to satisfy the following part of the general hashCode contract:

只要在 Java 应用程序的执行过程中对同一个对象多次调用,hashCode 方法必须始终返回相同的整数,前提是对象的 equals 比较中没有使用任何信息修改.这个整数不需要从一个应用程序的一次执行到同一应用程序的另一次执行保持一致."

"Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application."

identityHashCode() 的假设实现仅返回对象的 当前 机器地址,如果/当 GC 将对象移动到另一个地址.解决这个问题的唯一方法是(假设的)JVM 保证对象一旦调用了 hashCode 就永远不会移动.这会导致严重且难以处理的堆碎片问题.

A hypothetical implementation of identityHashCode() that simply returned the current machine address of an object would violate the highlighted part if/when the GC moved the object to a different address. The only way around this would be for the (hypothetical) JVM to guarantee that an object never moves once hashCode has been called on it. And that would lead to serious and intractable problems with heap fragmentation.

相关文章