Pandas - 带条件公式的 Groupby

问题描述

   Survived  SibSp  Parch
0         0      1      0
1         1      1      0
2         1      0      0
3         1      1      0
4         0      0      1

Given the above dataframe, is there an elegant way to groupby with a condition? I want to split the data into two groups based on the following conditions:

(df['SibSp'] > 0) | (df['Parch'] > 0) =   New Group -"Has Family"
 (df['SibSp'] == 0) & (df['Parch'] == 0) = New Group - "No Family"

then take the means of both of these groups and end up with an output like this:

               SurvivedMean
 Has Family    Mean
 No Family     Mean

Can it be done using groupby or would I have to append a new column using the above conditional statement?

解决方案

An easy way to group that is to use the sum of those two columns. If either of them is positive, the result will be greater than 1. And groupby accepts an arbitrary array as long as the length is the same as the DataFrame's length so you don't need to add a new column.

family = np.where((df['SibSp'] + df['Parch']) >= 1 , 'Has Family', 'No Family')
df.groupby(family)['Survived'].mean()
Out: 
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

相关文章