Pandas:如何根据其他列值的条件对列进行求和?
问题描述
我有以下熊猫数据框.
import pandas as pd
df = pd.read_csv('filename.csv')
print(df)
dog A B C
0 dog1 0.787575 0.159330 0.053095
1 dog10 0.770698 0.169487 0.059815
2 dog11 0.792689 0.152043 0.055268
3 dog12 0.785066 0.160361 0.054573
4 dog13 0.795455 0.150464 0.054081
5 dog14 0.794873 0.150700 0.054426
.. ....
8 dog19 0.811585 0.140207 0.048208
9 dog2 0.797202 0.152033 0.050765
10 dog20 0.801607 0.145137 0.053256
11 dog21 0.792689 0.152043 0.055268
....
我通过汇总列 "A"
、"B"
、"C"
来创建一个新列,如下所示:
I create a new column by summing columns "A"
, "B"
, "C"
as follows:
df['total_ABC'] = df[["A", "B", "B"]].sum(axis=1)
现在我想根据条件执行此操作,即 if "A" <0.78
然后创建一个新的求和列 df['smallA_sum'] = df[["A", "B", "B"]].sum(axis=1)
.否则,该值应为零.
Now I would like to do this based on a conditional, i.e. if "A" < 0.78
then create a new summed column df['smallA_sum'] = df[["A", "B", "B"]].sum(axis=1)
. Otherwise, the value should be zero.
如何创建这样的条件语句?
How does one create conditional statements like this?
我的想法是使用
df['smallA_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if row['A'] < 0.78))
但是,这不起作用,我无法指定轴.
However, this doesn't work and I'm not able to specify axis.
如何根据其他列的值创建列?
How do you create a column based on the values of other columns?
您也可以为每个 df['dog'] == 'dog2'
创建列 dog2_sum
,即
You could also do something like for each df['dog'] == 'dog2'
, create column dog2_sum
, i.e.
df['dog2_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if df['dog'] == 'dog2'))
但我的方法不正确.
`
解决方案
下面应该可以了,这里我们屏蔽满足条件的df,这会将NaN
设置为条件所在的行不满足,所以我们在新的 col 上调用 fillna
:
The following should work, here we mask the df where the condition is met, this will set NaN
to the rows where the condition isn't met so we call fillna
on the new col:
In [67]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df
Out[67]:
A B C
0 0.197334 0.707852 -0.443475
1 -1.063765 -0.914877 1.585882
2 0.899477 1.064308 1.426789
3 -0.556486 -0.150080 -0.149494
4 -0.035858 0.777523 -0.453747
In [73]:
df['total'] = df.loc[df['A'] > 0,['A','B']].sum(axis=1)
df['total'].fillna(0, inplace=True)
df
Out[73]:
A B C total
0 0.197334 0.707852 -0.443475 0.905186
1 -1.063765 -0.914877 1.585882 0.000000
2 0.899477 1.064308 1.426789 1.963785
3 -0.556486 -0.150080 -0.149494 0.000000
4 -0.035858 0.777523 -0.453747 0.000000
另一种方法是调用 where
在 sum
结果上,当条件不满足时,这需要一个值参数来返回:
Another approach is to call where
on the sum
result, this takes a value param to return when the condition isn't met:
In [75]:
df['total'] = df[['A','B']].sum(axis=1).where(df['A'] > 0, 0)
df
Out[75]:
A B C total
0 0.197334 0.707852 -0.443475 0.905186
1 -1.063765 -0.914877 1.585882 0.000000
2 0.899477 1.064308 1.426789 1.963785
3 -0.556486 -0.150080 -0.149494 0.000000
4 -0.035858 0.777523 -0.453747 0.000000
相关文章