Python中的分组/聚类数字
问题描述
我用谷歌搜索过,我已经测试过,这让我束手无策.我有一个需要按相似度分组的数字列表.例如,在 [1, 6, 9, 100, 102, 105, 109, 134, 139] 的列表中,将 1 6 9 放入列表中,将 100、102、105 和 109 放入列表中列表,以及 134 和 139.我的数学很糟糕,我已经尝试过这个,但我无法让它工作.为了尽可能明确,我希望将彼此相距 10 个值以内的数字分组.任何人都可以帮忙吗?谢谢.
I've googled, I've tested, and this has me at my wits end. I have a list of numbers I need to group by similarity. For instance, in a list of [1, 6, 9, 100, 102, 105, 109, 134, 139], 1 6 9 would be put into a list, 100, 102, 105, and 109 would be put into a list, and 134 and 139. I'm terrible at math, and I've tried and tried this, but I can't get it to work. To be explicit as possible, I wish to group numbers that are within 10 values away from one another. Can anyone help? Thanks.
解决方案
集群分析有很多方法.一种简单的方法是查看连续数据元素之间的间隙大小:
There are many ways to do cluster analysis. One simple approach is to look at the gap size between successive data elements:
def cluster(data, maxgap):
'''Arrange data into groups where successive elements
differ by no more than *maxgap*
>>> cluster([1, 6, 9, 100, 102, 105, 109, 134, 139], maxgap=10)
[[1, 6, 9], [100, 102, 105, 109], [134, 139]]
>>> cluster([1, 6, 9, 99, 100, 102, 105, 134, 139, 141], maxgap=10)
[[1, 6, 9], [99, 100, 102, 105], [134, 139, 141]]
'''
data.sort()
groups = [[data[0]]]
for x in data[1:]:
if abs(x - groups[-1][-1]) <= maxgap:
groups[-1].append(x)
else:
groups.append([x])
return groups
if __name__ == '__main__':
import doctest
print(doctest.testmod())
相关文章