Pandas - 根据条件复制行

2022-01-10 00:00:00 python pandas group-by duplicates

问题描述

如果行满足条件，我正在尝试创建重复行.在下表中，我根据 groupby 创建了一个累积计数，然后再计算 groupby 的 MAX.

I'm trying to create a duplicate row if the row meets a condition. In the table below, I created a cumulative count based on a groupby, then another calculation for the MAX of the groupby.

df['PathID'] = df.groupby(DateCompleted).cumcount() + 1 df['MaxPathID'] = df.groupby(DateCompleted)['PathID'].transform(max) Date Completed PathID MaxPathID 1/31/17 1 3 1/31/17 2 3 1/31/17 3 3 2/1/17 1 1 2/2/17 1 2 2/2/17 2 2

在这种情况下，我只想复制 2/1/17 的记录，因为该日期只有一个实例(即 MaxPathID == 1).

In this case, I want to duplicate only the record for 2/1/17 since there is only one instance for that date (i.e. where the MaxPathID == 1).

期望的输出:

Date Completed PathID MaxPathID 1/31/17 1 3 1/31/17 2 3 1/31/17 3 3 2/1/17 1 1 2/1/17 1 1 2/2/17 1 2 2/2/17 2 2

提前致谢！

解决方案

我认为你需要通过 Date Completed 获取 unique 行，然后 concat 行到原始:

I think you need get unique rows by Date Completed and then concat rows to original:

df1 = df.loc[~df['Date Completed'].duplicated(keep=False), ['Date Completed']] print (df1) Date Completed 3 2/1/17 df = pd.concat([df,df1], ignore_index=True).sort_values('Date Completed') df['PathID'] = df.groupby('Date Completed').cumcount() + 1 df['MaxPathID'] = df.groupby('Date Completed')['PathID'].transform(max) print (df) Date Completed PathID MaxPathID 0 1/31/17 1 3 1 1/31/17 2 3 2 1/31/17 3 3 3 2/1/17 1 2 6 2/1/17 2 2 4 2/2/17 1 2 5 2/2/17 2 2

print (df) Date Completed a b 0 1/31/17 4 5 1 1/31/17 3 5 2 1/31/17 6 3 3 2/1/17 7 9 4 2/2/17 2 0 5 2/2/17 6 7 df1 = df[~df['Date Completed'].duplicated(keep=False)] #alternative - boolean indexing by numpy array #df1 = df[~df['Date Completed'].duplicated(keep=False).values] print (df1) Date Completed a b 3 2/1/17 7 9 df = pd.concat([df,df1], ignore_index=True).sort_values('Date Completed') print (df) Date Completed a b 0 1/31/17 4 5 1 1/31/17 3 5 2 1/31/17 6 3 3 2/1/17 7 9 6 2/1/17 7 9 4 2/2/17 2 0 5 2/2/17 6 7

相关文章