如何在 pandas 中使用groupby根据另一列中的条件计算百分比/比例总数

2022-01-22 00:00:00 python pandas dataframe pivot group-by

问题描述

我正在尝试研究如何使用 pandas 中的 groupby 函数在给定的是/否标准下计算出每年值的比例.

I'm trying to work out how to use the groupby function in pandas to work out the proportions of values per year with a given Yes/No criteria.

例如，我有一个名为 names 的数据框:

For example, I have a dataframe called names:

Name Number Year Sex Criteria 0 name1 789 1998 Male N 1 name1 688 1999 Male N 2 name1 639 2000 Male N 3 name2 551 1998 Male Y 4 name2 499 1999 Male Y

我可以使用

namesgrouped = names.groupby(["Sex", "Year", "Criteria"]).sum()

得到:

Number Sex Year Criteria Male 1998 N 14507 Y 2308 1999 N 14119 Y 2331

等等.我希望数字标准"列显示每个性别和年份的总数百分比 - 所以上面 1998 年的 N = 14507 和 Y = 2308 我将有 N = 86.27% 和 Y = 13.73%.

and so on. I would like the 'Number Criteria' column to show the % of the total for each gender and year - so instead of N = 14507 and Y = 2308 for 1998 above I'd have N = 86.27% and Y = 13.73%.

谁能建议如何做到这一点?

Can anyone advise how to do this?

解决方案

这个问题是建议重复.借用接受的答案，这将起作用:

This question is a direct extension of the suggested duplicate. Borrowing from the accepted answer, this will work:

In [46]: namesgrouped.groupby(level=[0, 1]).apply(lambda g: g / g.sum()) Out[46]: Number Sex Year Criteria Male 1998 N 0.588806 Y 0.411194 1999 N 0.579612 Y 0.420388 2000 N 1.000000

<小时>
编辑:转换操作可能比应用更快:

Edit: a transform operation might be faster than apply:

namesgrouped / namesgrouped.groupby(level=[0, 1]).transform('sum')

相关文章