Pandas 数据透视表:列顺序和小计

2022-01-22 00:00:00 python pandas dataframe pivot

问题描述

我正在使用 Pandas 0.19.

I'm using Pandas 0.19.

考虑以下数据框:

FID admin0 admin1 admin2 windspeed population 0 cntry1 state1 city1 60km/h 700 1 cntry1 state1 city1 90km/h 210 2 cntry1 state1 city2 60km/h 100 3 cntry1 state2 city3 60km/h 70 4 cntry1 state2 city4 60km/h 180 5 cntry1 state2 city4 90km/h 370 6 cntry2 state3 city5 60km/h 890 7 cntry2 state3 city6 60km/h 120 8 cntry2 state3 city6 90km/h 420 9 cntry2 state3 city6 120km/h 360 10 cntry2 state4 city7 60km/h 740

如何创建这样的表格?

population 60km/h 90km/h 120km/h admin0 admin1 admin2 cntry1 state1 city1 700 210 0 cntry1 state1 city2 100 0 0 cntry1 state2 city3 70 0 0 cntry1 state2 city4 180 370 0 cntry2 state3 city5 890 0 0 cntry2 state3 city6 120 420 360 cntry2 state4 city7 740 0 0

我已尝试使用以下数据透视表:

I have tried with the following pivot table:

table = pd.pivot_table(df,index=["admin0","admin1","admin2"], columns=["windspeed"], values=["population"],fill_value=0)

一般来说效果很好，但不幸的是我无法按正确的顺序对新列进行排序:120km/h 列出现在 60km/h 和 90km/h 列之前.如何指定新列的顺序?

In general it works great, but unfortunately I am not able to sort the new columns in the right order: the 120km/h column appears before the ones for 60km/h and 90km/h. How can I specify the order of the new columns?

此外，作为第二步，我需要为 admin0 和 admin1 添加小计.理想情况下，我需要的表应该是这样的:

Moreover, as a second step I need to add subtotals both for admin0 and admin1. Ideally, the table I need should be like this:

population 60km/h 90km/h 120km/h admin0 admin1 admin2 cntry1 state1 city1 700 210 0 cntry1 state1 city2 100 0 0 SUM state1 800 210 0 cntry1 state2 city3 70 0 0 cntry1 state2 city4 180 370 0 SUM state2 250 370 0 SUM cntry1 1050 580 0 cntry2 state3 city5 890 0 0 cntry2 state3 city6 120 420 360 SUM state3 1010 420 360 cntry2 state4 city7 740 0 0 SUM state4 740 0 0 SUM cntry2 1750 420 360 SUM ALL 2800 1000 360

解决方案

使用小计和 MultiIndex.from_arrays.最后 concat 和所有数据帧, sort_index 并添加所有 sum:

#replace km/h and convert to int df.windspeed = df.windspeed.str.replace('km/h','').astype(int) print (df) FID admin0 admin1 admin2 windspeed population 0 0 cntry1 state1 city1 60 700 1 1 cntry1 state1 city1 90 210 2 2 cntry1 state1 city2 60 100 3 3 cntry1 state2 city3 60 70 4 4 cntry1 state2 city4 60 180 5 5 cntry1 state2 city4 90 370 6 6 cntry2 state3 city5 60 890 7 7 cntry2 state3 city6 60 120 8 8 cntry2 state3 city6 90 420 9 9 cntry2 state3 city6 120 360 10 10 cntry2 state4 city7 60 740 #pivoting table = pd.pivot_table(df, index=["admin0","admin1","admin2"], columns=["windspeed"], values=["population"], fill_value=0) print (table) population windspeed 60 90 120 admin0 admin1 admin2 cntry1 state1 city1 700 210 0 city2 100 0 0 state2 city3 70 0 0 city4 180 370 0 cntry2 state3 city5 890 0 0 city6 120 420 360 state4 city7 740 0 0

#groupby and create sum dataframe by levels 0,1 df1 = table.groupby(level=[0,1]).sum() df1.index = pd.MultiIndex.from_arrays([df1.index.get_level_values(0), df1.index.get_level_values(1)+ '_sum', len(df1.index) * ['']]) print (df1) population windspeed 60 90 120 admin0 cntry1 state1_sum 800 210 0 state2_sum 250 370 0 cntry2 state3_sum 1010 420 360 state4_sum 740 0 0 df2 = table.groupby(level=0).sum() df2.index = pd.MultiIndex.from_arrays([df2.index.values + '_sum', len(df2.index) * [''], len(df2.index) * ['']]) print (df2) population windspeed 60 90 120 cntry1_sum 1050 580 0 cntry2_sum 1750 420 360 #concat all dataframes together, sort index df = pd.concat([table, df1, df2]).sort_index(level=[0])

#add km/h to second level in columns df.columns = pd.MultiIndex.from_arrays([df.columns.get_level_values(0), df.columns.get_level_values(1).astype(str) + 'km/h']) #add all sum df.loc[('All_sum','','')] = table.sum().values print (df) population 60km/h 90km/h 120km/h admin0 admin1 admin2 cntry1 state1 city1 700 210 0 city2 100 0 0 state1_sum 800 210 0 state2 city3 70 0 0 city4 180 370 0 state2_sum 250 370 0 cntry1_sum 1050 580 0 cntry2 state3 city5 890 0 0 city6 120 420 360 state3_sum 1010 420 360 state4 city7 740 0 0 state4_sum 740 0 0 cntry2_sum 1750 420 360 All_sum 2800 1000 360

通过评论

def f(x): print (x) if (len(x) > 1): return x.sum() df1 = table.groupby(level=[0,1]).apply(f).dropna(how='all') df1.index = pd.MultiIndex.from_arrays([df1.index.get_level_values(0), df1.index.get_level_values(1)+ '_sum', len(df1.index) * ['']]) print (df1) population windspeed 60 90 120 admin0 cntry1 state1_sum 800.0 210.0 0.0 state2_sum 250.0 370.0 0.0 cntry2 state3_sum 1010.0 420.0 360.0

相关文章