在python中每隔n个项目拆分一个生成器/可迭代(splitEvery)

2022-01-10 00:00:00 python split iterator

问题描述

我正在尝试用 Python 编写 Haskell 函数splitEvery".这是它的定义:

I'm trying to write the Haskell function 'splitEvery' in Python. Here is it's definition:

splitEvery :: Int -> [e] -> [[e]]
    @'splitEvery' n@ splits a list into length-n pieces.  The last
    piece will be shorter if @n@ does not evenly divide the length of
    the list.

它的基本版本可以正常工作,但我想要一个可以与生成器表达式、列表和迭代器一起使用的版本.而且,如果有一个生成器作为输入,它应该返回一个生成器作为输出!

The basic version of this works fine, but I want a version that works with generator expressions, lists, and iterators. And, if there is a generator as an input it should return a generator as an output!

# should not enter infinite loop with generators or lists
splitEvery(itertools.count(), 10)
splitEvery(range(1000), 10)

# last piece must be shorter if n does not evenly divide
assert splitEvery(5, range(9)) == [[0, 1, 2, 3, 4], [5, 6, 7, 8]]

# should give same correct results with generators
tmp = itertools.islice(itertools.count(), 10)
assert list(splitEvery(5, tmp)) == [[0, 1, 2, 3, 4], [5, 6, 7, 8]]

当前实施

这是我目前拥有的代码,但它不适用于简单的列表.

Current Implementation

Here is the code I currently have but it doesn't work with a simple list.

def splitEvery_1(n, iterable):
    res = list(itertools.islice(iterable, n))
    while len(res) != 0:
        yield res
        res = list(itertools.islice(iterable, n))

这个不适用于生成器表达式(感谢 jellybean 修复它):

This one doesn't work with a generator expression (thanks to jellybean for fixing it):

def splitEvery_2(n, iterable): 
    return [iterable[i:i+n] for i in range(0, len(iterable), n)]

必须有一段简单的代码来进行拆分.我知道我可以拥有不同的功能,但它似乎应该很容易做到.我可能陷入了一个不重要的问题,但这真的让我很烦恼.

There has to be a simple piece of code that does the splitting. I know I could just have different functions but it seems like it should be and easy thing to do. I'm probably getting stuck on an unimportant problem but it's really bugging me.

类似于 http://docs.python.org 的 grouper/library/itertools.html#itertools.groupby 但我不希望它填充额外的值.

It is similar to grouper from http://docs.python.org/library/itertools.html#itertools.groupby but I don't want it to fill extra values.

def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

它确实提到了一种截断最后一个值的方法.这也不是我想要的.

It does mention a method that truncates the last value. This isn't what I want either.

保证迭代的从左到右的评估顺序.这使得使用 izip(*[iter(s)]*n) 将数据系列聚类为 n 长度组成为可能.

The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using izip(*[iter(s)]*n).

list(izip(*[iter(range(9))]*5)) == [[0, 1, 2, 3, 4]]
# should be [[0, 1, 2, 3, 4], [5, 6, 7, 8]]


解决方案

from itertools import islice

def split_every(n, iterable):
    i = iter(iterable)
    piece = list(islice(i, n))
    while piece:
        yield piece
        piece = list(islice(i, n))

一些测试:

>>> list(split_every(5, range(9)))
[[0, 1, 2, 3, 4], [5, 6, 7, 8]]

>>> list(split_every(3, (x**2 for x in range(20))))
[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81, 100, 121], [144, 169, 196], [225, 256, 289], [324, 361]]

>>> [''.join(s) for s in split_every(6, 'Hello world')]
['Hello ', 'world']

>>> list(split_every(100, []))
[]

相关文章