Python - 类 __hash__ 方法和集合

2022-01-17 00:00:00 python python-3.x set hash python-datamodel

问题描述

我正在使用 python 类的 set()__hash__ 方法来防止在集合中添加相同的哈希对象.根据python 数据模型文档,set() 将相同的哈希对象视为相同的对象,只需添加一次.

I'm using set() and __hash__ method of python class to prevent adding same hash object in set. According to python data-model document, set() consider same hash object as same object and just add them once.

但它的行为如下所示:

class MyClass(object):

    def __hash__(self):
        return 0

result = set()
result.add(MyClass())
result.add(MyClass())

print(len(result)) # len = 2

在字符串值的情况下,它可以正常工作.

While in case of string value, it works correctly.

result.add('aida')
result.add('aida')

print(len(result)) # len = 1

我的问题是:为什么相同的哈希对象在集合中不一样?

My question is: why the same hash objects are not same in set?


解决方案

您的阅读不正确.__eq__ 方法用于相等性检查.文档只是声明 __hash__ 值对于 a == b<的 2 个对象 ab 也必须相同/code>(即 a.__eq__(b))为真.

Your reading is incorrect. The __eq__ method is used for equality checks. The documents just state that the __hash__ value must also be the same for 2 objects a and b for which a == b (i.e. a.__eq__(b)) is true.

这是一个常见的逻辑错误:a == b 为真 暗示 hash(a) == hash(b) 也是正确的.然而,暗示并不一定意味着等价,除了之前的 hash(a) == hash(b) 意味着 a == b.

This is a common logic mistake: a == b being true implies that hash(a) == hash(b) is also true. However, an implication does not necessarily mean equivalence, that in addition to the prior, hash(a) == hash(b) would mean that a == b.

要使 MyClass 的所有实例彼此相等,您需要为它们提供一个 __eq__ 方法;否则 Python 将改为比较它们的身份.这可能会:

To make all instances of MyClass compare equal to each other, you need to provide an __eq__ method for them; otherwise Python will compare their identities instead. This might do:

class MyClass(object):
    def __hash__(self):
        return 0
    def __eq__(self, other):
        # another object is equal to self, iff 
        # it is an instance of MyClass
        return isinstance(other, MyClass)

现在:

>>> result = set()
>>> result.add(MyClass())
>>> result.add(MyClass())
1

<小时>

实际上,您会将 __hash__ 基于用于 __eq__ 比较的对象的那些属性,例如:


In reality you'd base the __hash__ on those properties of your object that are used for __eq__ comparison, for example:

class Person
    def __init__(self, name, ssn):
        self.name = name
        self.ssn = ssn

    def __eq__(self, other):
        return isinstance(other, Person) and self.ssn == other.ssn

    def __hash__(self):
        # use the hashcode of self.ssn since that is used
        # for equality checks as well
        return hash(self.ssn)

p = Person('Foo Bar', 123456789)
q = Person('Fake Name', 123456789)
print(len({p, q})  # 1

相关文章