集合并集查找算法
问题描述
我有数千行 1 到 100 个数字,每行定义一组数字以及它们之间的关系.我需要获取相关数字的集合.
I have thousands of lines of 1 to 100 numbers, every line define a group of numbers and a relationship among them. I need to get the sets of related numbers.
小例子:如果我有这 7 行数据
Little Example: If I have this 7 lines of data
T1 T2
T3
T4
T5
T6 T1
T5 T4
T3 T4 T7
我需要一个不那么慢的算法来知道这里的集合是:
I need a not so slow algorithm to know that the sets here are:
T1 T2 T6 (because T1 is related with T2 in the first line and T1 related with T6 in the line 5)
T3 T4 T5 T7 (because T5 is with T4 in line 6 and T3 is with T4 and T7 in line 7)
但是当你有非常大的集合时,在每个大集合中搜索一个 T(x) 并进行集合并集......等等.
but when you have very big sets is painfully slow to do a search of a T(x) in every big set, and do unions of sets... etc.
您是否有提示以不那么蛮力的方式执行此操作?
Do you have a hint to do this in a not so brute force manner?
我正在尝试在 Python 中执行此操作.
I'm trying to do this in Python.
解决方案
一旦你构建了数据结构,你到底想对它运行什么查询?向我们展示您现有的代码.什么是 T(x)?您谈论数字组",但您的样本数据显示 T1、T2 等;请解释一下.
Once you have built the data structure, exactly what queries do you want to run against it? Show us your existing code. What is a T(x)? You talk about "groups of numbers" but your sample data shows T1, T2, etc; please explain.
你读过这个吗:http://en.wikipedia.org/wiki/Disjoint-set_data_structure
尝试查看以下 Python 实现:http://code.activestate.com/recipes/215912-union-find-data-structure/
Try looking at this Python implementation: http://code.activestate.com/recipes/215912-union-find-data-structure/
或者你可以自己写一些相当简单易懂的东西,例如
OR you can lash up something rather simple and understandable yourself, e.g.
[更新:全新代码]
class DisjointSet(object):
def __init__(self):
self.leader = {} # maps a member to the group's leader
self.group = {} # maps a group leader to the group (which is a set)
def add(self, a, b):
leadera = self.leader.get(a)
leaderb = self.leader.get(b)
if leadera is not None:
if leaderb is not None:
if leadera == leaderb: return # nothing to do
groupa = self.group[leadera]
groupb = self.group[leaderb]
if len(groupa) < len(groupb):
a, leadera, groupa, b, leaderb, groupb = b, leaderb, groupb, a, leadera, groupa
groupa |= groupb
del self.group[leaderb]
for k in groupb:
self.leader[k] = leadera
else:
self.group[leadera].add(b)
self.leader[b] = leadera
else:
if leaderb is not None:
self.group[leaderb].add(a)
self.leader[a] = leaderb
else:
self.leader[a] = self.leader[b] = a
self.group[a] = set([a, b])
data = """T1 T2
T3 T4
T5 T1
T3 T6
T7 T8
T3 T7
T9 TA
T1 T9"""
# data is chosen to demonstrate each of 5 paths in the code
from pprint import pprint as pp
ds = DisjointSet()
for line in data.splitlines():
x, y = line.split()
ds.add(x, y)
print
print x, y
pp(ds.leader)
pp(ds.group)
这是最后一步的输出:
T1 T9
{'T1': 'T1',
'T2': 'T1',
'T3': 'T3',
'T4': 'T3',
'T5': 'T1',
'T6': 'T3',
'T7': 'T3',
'T8': 'T3',
'T9': 'T1',
'TA': 'T1'}
{'T1': set(['T1', 'T2', 'T5', 'T9', 'TA']),
'T3': set(['T3', 'T4', 'T6', 'T7', 'T8'])}
相关文章