在 groupby 聚合之后指定列顺序

2022-01-15 00:00:00 python pandas format

问题描述

每次运行代码时，我的年龄、身高和体重列的顺序都会发生变化.我需要保持 agg 列的顺序不变，因为我最终会根据列位置引用此输出文件.如何确保每次都以相同的顺序输出年龄、身高和体重?

The ordering of my age, height and weight columns is changing with each run of the code. I need to keep the order of my agg columns static because I ultimately refer to this output file according to the column locations. What can I do to make sure age, height and weight are output in the same order every time?

d = pd.read_csv(input_file, na_values=['']) df = pd.DataFrame(d) df.index_col = ['name', 'address'] df_out = df.groupby(df.index_col).agg({'age':np.mean, 'height':np.sum, 'weight':np.sum}) df_out.to_csv(output_file, sep=',')

解决方案

我觉得你可以使用subset:

I think you can use subset:

df_out = df.groupby(df.index_col) .agg({'age':np.mean, 'height':np.sum, 'weight':np.sum})[['age','height','weight']]

你也可以使用 pandas 函数:

Also you can use pandas functions:

df_out = df.groupby(df.index_col) .agg({'age':'mean', 'height':sum, 'weight':sum})[['age','height','weight']]

示例:

df = pd.DataFrame({'name':['q','q','a','a'], 'address':['a','a','s','s'], 'age':[7,8,9,10], 'height':[1,3,5,7], 'weight':[5,3,6,8]}) print (df) address age height name weight 0 a 7 1 q 5 1 a 8 3 q 3 2 s 9 5 a 6 3 s 10 7 a 8 df.index_col = ['name', 'address'] df_out = df.groupby(df.index_col) .agg({'age':'mean', 'height':sum, 'weight':sum})[['age','height','weight']] print (df_out) age height weight name address a s 9.5 12 14 q a 7.5 4 8

根据建议编辑 - 添加 reset_index，如果也需要索引值，这里 as_index=False 不起作用:

EDIT by suggestion - add reset_index, here as_index=False does not work if need index values too:

df_out = df.groupby(df.index_col) .agg({'age':'mean', 'height':sum, 'weight':sum})[['age','height','weight']] .reset_index() print (df_out) name address age height weight 0 a s 9.5 12 14 1 q a 7.5 4 8

相关文章