pandas 一次迭代多行并重叠

2022-01-24 00:00:00 python pandas iteration

问题描述

我有一个 pandas DataFrame，需要以 n 行块的形式输入下游函数(示例中为 print).这些块可能有重叠的行.

I have a pandas DataFrame that need to be fed in chunks of n-rows into downstream functions (print in the example). The chunks may have overlapping rows.

让我们从一个虚拟的 DataFrame 开始:

Let's start from a dummy DataFrame:

d = {'A':list(range(1000)), 'B':list(range(1000))} df=pd.DataFrame(d)

对于具有 1 行重叠的 2 行块，我有以下代码:

In the case of a 2-rows chunks with 1-row overlap I have the following code:

a = df.index.values[:-1] for i in a: print(df.iloc[i:i+2])

输出是这样的:

... A B 996 996 996 997 997 997 A B 997 997 997 998 998 998 A B 998 998 998 999 999 999

这正是我想要的.

是否有更好/更快的方法来迭代 pandas.DataFrame 的 n 行块?

Is there a better/faster approach to iterate over chunks of n-rows of a pandas.DataFrame?

解决方案

使用DataFrame.groupby 使用与 df 相同长度创建的辅助一维数组进行整数除法 - 索引值不重叠:

Use DataFrame.groupby with integer division with helper 1d array created with same length like df - index values are not overlapped:

d = {'A':list(range(5)), 'B':list(range(5))} df=pd.DataFrame(d) print (np.arange(len(df)) // 2) [0 0 1 1 2] for i, g in df.groupby(np.arange(len(df)) // 2): print (g) A B 0 0 0 1 1 1 A B 2 2 2 3 3 3 A B 4 4 4

对于重叠值进行编辑这个答案:

def chunker1(seq, size): return (seq.iloc[pos:pos + size] for pos in range(0, len(seq)-1)) for i in chunker1(df,2): print (i) A B 0 0 0 1 1 1 A B 1 1 1 2 2 2 A B 2 2 2 3 3 3 A B 3 3 3 4 4 4

相关文章