pandas 聚合的条件总和

2022-01-13 00:00:00 python pandas r data.table

问题描述

我最近刚刚从 R 切换到 python，并且在再次习惯数据帧而不是使用 R 的 data.table 时遇到了一些麻烦.我遇到的问题是我想获取一个字符串列表，检查一个值，然后将该字符串的计数相加 - 由用户分解.所以我想把这些数据:

I just recently made the switch from R to python and have been having some trouble getting used to data frames again as opposed to using R's data.table. The problem I've been having is that I'd like to take a list of strings, check for a value, then sum the count of that string- broken down by user. So I would like to take this data:

A_id B C 1: a1 "up" 100 2: a2 "down" 102 3: a3 "up" 100 3: a3 "up" 250 4: a4 "left" 100 5: a5 "right" 102

然后返回:

A_id_grouped sum_up sum_down ... over_200_up 1: a1 1 0 ... 0 2: a2 0 1 0 3: a3 2 0 ... 1 4: a4 0 0 0 5: a5 0 0 ... 0

在我用 R 代码做之前(使用 data.table)

Before I did it with the R code (using data.table)

>DT[ ,list(A_id_grouped, sum_up = sum(B == "up"), + sum_down = sum(B == "down"), + ..., + over_200_up = sum(up == "up" & < 200), by=list(A)];

但是，我最近使用 Python 的所有尝试都失败了:

However all of my recent attempts with Python have failed me:

DT.agg({"D": [np.sum(DT[DT["B"]=="up"]),np.sum(DT[DT["B"]=="up"])], ... "C": np.sum(DT[(DT["B"]=="up") & (DT["C"]>200)]) })

提前感谢您！这似乎是一个简单的问题，但我在任何地方都找不到.

Thank you in advance! it seems like a simple question however I couldn't find it anywhere.

解决方案

为了补充 unutbu 的答案，这里有一个在 groupby 对象上使用 apply 的方法.

To complement unutbu's answer, here's an approach using apply on the groupby object.

>>> df.groupby('A_id').apply(lambda x: pd.Series(dict( sum_up=(x.B == 'up').sum(), sum_down=(x.B == 'down').sum(), over_200_up=((x.B == 'up') & (x.C > 200)).sum() ))) over_200_up sum_down sum_up A_id a1 0 0 1 a2 0 1 0 a3 1 0 2 a4 0 0 0 a5 0 0 0

相关文章