获取 Pandas 列的总数

2022-01-09 00:00:00 python pandas dataframe sum

问题描述

目标

我有一个 Pandas 数据框,如下所示,它有多个列,并且想要获取列的总数,MyColumn.

I have a Pandas data frame, as shown below, with multiple columns and would like to get the total of column, MyColumn.

数据框 - df:

打印 df

           X           MyColumn  Y              Z   
0          A           84        13.0           69.0   
1          B           76         77.0          127.0   
2          C           28         69.0           16.0   
3          D           28         28.0           31.0   
4          E           19         20.0           85.0   
5          F           84        193.0           70.0   

<小时>

我的尝试:

我尝试使用 groupby.sum() 获取列的总和:

I have attempted to get the sum of the column using groupby and .sum():

Total = df.groupby['MyColumn'].sum()

print Total

这会导致以下错误:

TypeError: 'instancemethod' object has no attribute '__getitem__'

<小时>

预期输出

我希望输出如下:

319

或者,我希望 df 使用标题为 TOTAL 的新 row 进行编辑,其中包含总数:

Or alternatively, I would like df to be edited with a new row entitled TOTAL containing the total:

           X           MyColumn  Y              Z   
0          A           84        13.0           69.0   
1          B           76         77.0          127.0   
2          C           28         69.0           16.0   
3          D           28         28.0           31.0   
4          E           19         20.0           85.0   
5          F           84        193.0           70.0   
TOTAL                  319


解决方案

你应该使用 sum:

You should use sum:

Total = df['MyColumn'].sum()
print (Total)
319

然后你使用 locSeries,在这种情况下,索引应设置为与您需要求和的特定列相同:

Then you use loc with Series, in that case the index should be set as the same as the specific column you need to sum:

df.loc['Total'] = pd.Series(df['MyColumn'].sum(), index = ['MyColumn'])
print (df)
         X  MyColumn      Y      Z
0        A      84.0   13.0   69.0
1        B      76.0   77.0  127.0
2        C      28.0   69.0   16.0
3        D      28.0   28.0   31.0
4        E      19.0   20.0   85.0
5        F      84.0  193.0   70.0
Total  NaN     319.0    NaN    NaN

因为如果你传递标量,所有行的值都会被填充:

because if you pass scalar, the values of all rows will be filled:

df.loc['Total'] = df['MyColumn'].sum()
print (df)
         X  MyColumn      Y      Z
0        A        84   13.0   69.0
1        B        76   77.0  127.0
2        C        28   69.0   16.0
3        D        28   28.0   31.0
4        E        19   20.0   85.0
5        F        84  193.0   70.0
Total  319       319  319.0  319.0

另外两个解决方案是 atix 查看以下应用:

Two other solutions are with at, and ix see the applications below:

df.at['Total', 'MyColumn'] = df['MyColumn'].sum()
print (df)
         X  MyColumn      Y      Z
0        A      84.0   13.0   69.0
1        B      76.0   77.0  127.0
2        C      28.0   69.0   16.0
3        D      28.0   28.0   31.0
4        E      19.0   20.0   85.0
5        F      84.0  193.0   70.0
Total  NaN     319.0    NaN    NaN

<小时>

df.ix['Total', 'MyColumn'] = df['MyColumn'].sum()
print (df)
         X  MyColumn      Y      Z
0        A      84.0   13.0   69.0
1        B      76.0   77.0  127.0
2        C      28.0   69.0   16.0
3        D      28.0   28.0   31.0
4        E      19.0   20.0   85.0
5        F      84.0  193.0   70.0
Total  NaN     319.0    NaN    NaN

注意:自 Pandas v0.20 起,ix 已被弃用.请改用 lociloc.

Note: Since Pandas v0.20, ix has been deprecated. Use loc or iloc instead.

相关文章