比较Python中连续元组列表的第一个元素

2022-01-22 00:00:00 python python-2.7 list append compare

问题描述

我有一个元组列表,每个元组包含两个元素.少数子列表的第一个元素很常见.我想比较这些子列表的第一个元素并将第二个元素附加到一个列表中.这是我的清单:

I have a list of tuples, each containing two elements. The first element of few sublists is common. I want to compare the first element of these sublists and append the second element in one lists. Here is my list:

myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]

我想从中列出一个列表,看起来像这样:`

I would like to make a list of lists out of it which looks something like this:`

NewList=[(2,3,4,5),(6,7,8),(9,10)]

我希望有什么有效的方法.

I hope if there is any efficient way.


解决方案

您可以使用 OrderedDict 按每个元组的第一个子元素对元素进行分组:

You can use an OrderedDict to group the elements by the first subelement of each tuple:

myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]

from collections import OrderedDict

od  = OrderedDict()

for a,b in myList:
    od.setdefault(a,[]).append(b)

print(list(od.values()))
[[2, 3, 4, 5], [6, 7, 8], [9, 10]]

如果你真的想要元组:

print(list(map(tuple,od.values())))
[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

如果您不关心元素出现的顺序,而只是想要最有效的分组方式,您可以使用 collections.defaultdict:

If you did not care about the order the elements appeared and just wanted the most efficient way to group you could use a collections.defaultdict:

from collections import defaultdict

od  = defaultdict(list)

for a,b in myList:
    od[a].append(b)

print(list(od.values()))

最后,如果您的数据按照您的输入示例排序,您可以简单地使用 itertools.groupby 按每个元组的第一个子元素分组,并从分组的元组中提取第二个元素:

Lastly, if your data is in order as per your input example i.e sorted you could simply use itertools.groupby to group by the first subelement from each tuple and extract the second element from the grouped tuples:

from itertools import groupby
from operator import itemgetter
print([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))])

输出:

[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

同样,groupby 仅在您的数据至少按第一个元素排序时才有效.

Again the groupby will only work if your data is sorted by at least the first element.

合理大小列表中的一些时间安排:

Some timings on a reasonable sized list:

In [33]: myList = [(randint(1,10000),randint(1,10000)) for _ in range(100000)]

In [34]: myList.sort()

In [35]: timeit ([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))])
10 loops, best of 3: 44.5 ms per loop

In [36]: %%timeit                                                               od = defaultdict(list)
for a,b in myList:
    od[a].append(b)
   ....: 
10 loops, best of 3: 33.8 ms per loop

In [37]: %%timeit
dictionary = OrderedDict()
for x, y in myList:
     if x not in dictionary:
        dictionary[x] = [] # new empty list
    dictionary[x].append(y)
   ....: 
10 loops, best of 3: 63.3 ms per loop

In [38]: %%timeit   
od = OrderedDict()
for a,b in myList:
    od.setdefault(a,[]).append(b)
   ....: 
10 loops, best of 3: 80.3 ms per loop

如果顺序很重要并且数据已排序,请使用groupby,如果需要将所有元素映射到,它会更接近defaultdict方法默认字典中的元组.

If order matters and the data is sorted, go with the groupby, it will get even closer to the defaultdict approach if it is necessary to map all the elements to tuple in the defaultdict.

如果数据没有排序或者您不关心任何顺序,您将找不到比使用 defaultdict 方法更快的分组方法.

If the data is not sorted or you don't care about any order, you won't find a faster way to group than using the defaultdict approach.

相关文章