交错不同长度的列表,消除重复,并保持顺序

2022-01-10 00:00:00 python list iterator

问题描述

我有两个列表,比如说:

I have two lists, let's say:

keys1 = ['A', 'B', 'C', 'D', 'E',           'H', 'I']
keys2 = ['A', 'B',           'E', 'F', 'G', 'H',      'J', 'K']

如何创建一个没有重复的合并列表,保留两个列表的顺序,在它们所属的位置插入缺失的元素?像这样:

How do I create a merged list without duplicates that preserve the order of both lists, inserting the missing elements where they belong? Like so:

merged = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K']

请注意,元素可以根据相等性进行比较,但不是有序的(它们是复杂的字符串).元素不能通过比较来排序,但它们有一个基于它们在原始列表中出现的顺序.

Note that the elements can be compared against equality but not ordered (they are complex strings). The elements can't be ordered by comparing them, but they have an order based on their occurrence in the original lists.

如果出现矛盾(两个输入列表中的顺序不同),任何包含所有元素的输出都是有效的.当然,如果解决方案在保留大部分订单方面显示出常识",则可以加分.

In case of contradiction (different order in both input lists), any output containing all elements is valid. Of course with bonus points if the solution shows 'common sense' in preserving most of the order.

再一次(正如一些评论仍然争论的那样),列表通常不会在公共元素的顺序方面相互矛盾.如果他们这样做了,算法需要优雅地处理该错误.

Again (as some comments still argue about it), the lists normally don't contradict each other in terms of the order of the common elements. In case they do, the algorithm needs to handle that error gracefully.

我从一个版本开始,它使用 .next() 遍历列表以仅推进包含不匹配元素的列表,但 .next() 只是不知道何时停止.

I started with a version that iterates over the lists with .next() to advance just the list containing the unmatched elements, but .next() just doesn't know when to stop.

merged = []
L = iter(keys1)
H = iter(keys2)
l = L.next()
h = H.next()

for i in range(max(len(keys1, keys2))):
  if l == h:
    if l not in merged:
      merged.append(l)
    l = L.next()
    h = H.next()

  elif l not in keys2:
    if l not in merged:
      merged.append(l)
    l = L.next()

  elif h not in keys1:
    if h not in merged:
      merged.append(h)
    h = H.next()

  else: # just in case the input is badly ordered
    if l not in merged:
      merged.append(l)
    l = L.next()
    if h not in merged:
      merged.append(h)
    h = H.next()   

print merged

这显然不起作用,因为 .next() 会导致最短列表异常.现在我可以更新我的代码以在每次调用 .next() 时捕获该异常.但是代码已经很不符合pythonic了,这显然会破灭泡沫.

This obviously doesn't work, as .next() will cause an exception for the shortest list. Now I could update my code to catch that exception every time I call .next(). But the code already is quite un-pythonic and this would clearly burst the bubble.

有没有人更好地了解如何遍历这些列表以组合元素?

Does anyone have a better idea of how to iterate over those lists to combine the elements?

如果我可以一次完成三个列表,则可以获得奖励积分.

Bonus points if I can do it for three lists in one go.


解决方案

您需要的基本上是任何合并实用程序所做的:它尝试合并两个序列,同时保持每个序列的相对顺序.您可以使用 Python 的 difflib 模块来区分这两个序列,并将它们合并:

What you need is basically what any merge utility does: It tries to merge two sequences, while keeping the relative order of each sequence. You can use Python's difflib module to diff the two sequences, and merge them:

from difflib import SequenceMatcher

def merge_sequences(seq1,seq2):
    sm=SequenceMatcher(a=seq1,b=seq2)
    res = []
    for (op, start1, end1, start2, end2) in sm.get_opcodes():
        if op == 'equal' or op=='delete':
            #This range appears in both sequences, or only in the first one.
            res += seq1[start1:end1]
        elif op == 'insert':
            #This range appears in only the second sequence.
            res += seq2[start2:end2]
        elif op == 'replace':
            #There are different ranges in each sequence - add both.
            res += seq1[start1:end1]
            res += seq2[start2:end2]
    return res

例子:

>>> keys1 = ['A', 'B', 'C', 'D', 'E',           'H', 'I']
>>> keys2 = ['A', 'B',           'E', 'F', 'G', 'H',      'J', 'K']
>>> merge_sequences(keys1, keys2)
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K']

请注意,您期望的答案不一定是唯一可能的答案.例如,如果我们在这里改变序列的顺序,我们会得到另一个同样有效的答案:

Note that the answer you expect is not necessarily the only possible one. For example, if we change the order of sequences here, we get another answer which is just as valid:

>>> merge_sequences(keys2, keys1)
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'I']

相关文章