Python 和 Numpy 的 nan 和 set
问题描述
我在使用 Python 的 Numpy、set 和 NaN(非数字)时遇到了无法预料的行为:
I ran into an unpredicted behavior with Python's Numpy, set and NaN (not-a-number):
>>> set([np.float64('nan'), np.float64('nan')])
set([nan, nan])
>>> set([np.float32('nan'), np.float32('nan')])
set([nan, nan])
>>> set([np.float('nan'), np.float('nan')])
set([nan, nan])
>>> set([np.nan, np.nan])
set([nan])
>>> set([float('nan'), float('nan')])
set([nan, nan])
这里 np.nan 产生单个元素集,而 Numpy 的 nan 产生一个集合中的多个 nan.float('nan') 也是如此!请注意:
Here np.nan yields a single element set, while Numpy's nans yield multiple nans in a set. So does float('nan')! And note that:
>>> type(float('nan')) == type(np.nan)
True
我想知道这种差异是如何产生的,不同行为背后的合理性是什么.
I wonder how this difference come about and what the rationality is behind the different behaviors.
解决方案
NAN 的一个属性是 NAN != NAN,与所有其他数字不同.但是,set
的实现首先检查 id(x) 是否与哈希索引处的现有成员匹配,然后再尝试插入新成员.如果您有两个具有不同 id 且都具有值 NAN 的对象,您将在集合中获得两个条目.如果它们都具有相同的 id,则它们会折叠成一个条目.
One of the properties of NAN is that NAN != NAN, unlike all other numbers. However, the implementation of set
first checks to see if id(x) matches the existing member at a hash index before it tries to insert a new one. If you have two objects with different ids that both have the value NAN, you'll get two entries in the set. If they both have the same id, they collapse into a single entry.
正如其他人指出的那样,np.nan
是一个始终具有相同 id 的单个对象.
As pointed out by others, np.nan
is a single object that will always have the same id.
相关文章