跳过可迭代元素的优雅方式

2022-01-10 00:00:00 python iterable iterator

问题描述

我有一个大的迭代器,事实上,一个大的迭代器由以下给出:

I've got a large iterable, in fact, a large iterable given by:

itertools.permutations(range(10))

我想访问第百万个元素.我已经用一些不同的方式解决了问题.

I would like to access to the millionth element. I alredy have problem solved in some different ways.

  1. 将可迭代对象转换为列表并获取第 1000000 个元素:

  1. Casting iterable to list and getting 1000000th element:

return list(permutations(range(10)))[999999]

  • 手动跳过元素直到 999999:

  • Manually skiping elements till 999999:

    p = permutations(range(10))
    for i in xrange(999999): p.next()
    return p.next()
    

  • 手动跳过元素 v2:

  • Manually skiping elements v2:

    p = permutations(range(10))
    for i, element in enumerate(p):
        if i == 999999:
            return element
    

  • 使用 itertools 中的 islice:

  • Using islice from itertools:

    return islice(permutations(range(10)), 999999, 1000000).next()
    

  • 但我仍然不觉得它们都不是 python 的优雅方式.第一个选项太昂贵了,它需要计算整个迭代来访问单个元素.如果我没记错的话,islice 在内部进行的计算与我在方法 2 中所做的相同,并且几乎与第 3 次一样,也许它有更多的冗余操作.

    But I still don't feel like none of them is the python's elegant way to do that. First option is just too expensive, it needs to compute the whole iterable just to access a single element. If I'm not wrong, islice does internally the same computation I just did in method 2, and is almost exactly as 3rd, maybe it has even more redundant operations.

    所以,我只是好奇,想知道在 python 中是否有其他方式可以访问可迭代的具体元素,或者至少以更优雅的方式跳过第一个元素,或者我是否只需要使用上述之一.

    So, I'm just curious, wondering if there is in python some other way to access to a concrete element of an iterable, or at least to skip the first elements, in some more elegant way, or if I just need to use one of the aboves.


    解决方案

    使用itertools 配方 consume 跳过 n 元素:

    def consume(iterator, n):
        "Advance the iterator n-steps ahead. If n is none, consume entirely."
        # Use functions that consume iterators at C speed.
        if n is None:
            # feed the entire iterator into a zero-length deque
            collections.deque(iterator, maxlen=0)
        else:
            # advance to the empty slice starting at position n
            next(islice(iterator, n, n), None)
    

    注意那里的 islice() 调用;它使用 n, n,实际上不返回 anything,并且 next() 函数回退到默认值.

    Note the islice() call there; it uses n, n, effectively not returning anything, and the next() function falls back to the default.

    简化为您的示例,您希望跳过 999999 个元素,然后返回元素 1000000:

    Simplified to your example, where you want to skip 999999 elements, then return element 1000000:

    return next(islice(permutations(range(10)), 999999, 1000000))
    

    islice() 在 C 中处理迭代器,这是 Python 循环无法比拟的.

    islice() processes the iterator in C, something that Python loops cannot beat.

    为了说明,以下是每种方法仅重复 10 次的时间:

    To illustrate, here are the timings for just 10 repeats of each method:

    >>> from itertools import islice, permutations
    >>> from timeit import timeit
    >>> def list_index():
    ...     return list(permutations(range(10)))[999999]
    ... 
    >>> def for_loop():
    ...     p = permutations(range(10))
    ...     for i in xrange(999999): p.next()
    ...     return p.next()
    ... 
    >>> def enumerate_loop():
    ...     p = permutations(range(10))
    ...     for i, element in enumerate(p):
    ...         if i == 999999:
    ...             return element
    ... 
    >>> def islice_next():
    ...     return next(islice(permutations(range(10)), 999999, 1000000))
    ... 
    >>> timeit('f()', 'from __main__ import list_index as f', number=10)
    5.550895929336548
    >>> timeit('f()', 'from __main__ import for_loop as f', number=10)
    1.6166789531707764
    >>> timeit('f()', 'from __main__ import enumerate_loop as f', number=10)
    1.2498459815979004
    >>> timeit('f()', 'from __main__ import islice_next as f', number=10)
    0.18969106674194336
    

    islice() 方法比第二快的方法快近 7 倍.

    The islice() method is nearly 7 times faster than the next fastest method.

    相关文章