Pandas 按年份分组，按销售列排名，在具有重复数据的数据框中

2022-01-10 00:00:00 python pandas pandas-groupby duplicates rank

问题描述

我想创建一个年度排名(所以在 2012 年，经理 B 为 1.在 2011 年，经理 B 再次为 1).我在 pandas rank 函数上苦苦挣扎了一段时间，不想诉诸 for 循环.

I would like to create a rank on year (so in year 2012, Manager B is 1. In 2011, Manager B is 1 again). I struggled with the pandas rank function for awhile and DO NOT want to resort to a for loop.

s = pd.DataFrame([['2012','A',3],['2012','B',8],['2011','A',20],['2011','B',30]], columns=['Year','Manager','Return']) Out[1]: Year Manager Return 0 2012 A 3 1 2012 B 8 2 2011 A 20 3 2011 B 30

<小时>
我遇到的问题是附加代码(之前认为这无关紧要):

The issue I'm having is with the additional code (didn't think this would be relevant before):

s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return']) b = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return']) s = s.append(b) s['Rank'] = s.groupby(['Year'])['Return'].rank(ascending=False) raise Exception('Reindexing only valid with uniquely valued Index ' Exception: Reindexing only valid with uniquely valued Index objects

有什么想法吗?
这是我正在使用的真实数据结构.重新索引时遇到问题..

Any ideas?
This is the real data structure I am using. Been having trouble re-indexing..

解决方案

听起来你想按Year分组，然后按降序排列Returns.

It sounds like you want to group by the Year, then rank the Returns in descending order.

import pandas as pd s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return']) s['Rank'] = s.groupby(['Year'])['Return'].rank(ascending=False) print(s)

产量

Year Manager Return Rank 0 2012 A 3 2 1 2012 B 8 1 2 2011 A 20 2 3 2011 B 30 1

<小时>
解决 OP 修改后的问题:错误消息

To address the OP's revised question: The error message

ValueError: cannot reindex from a duplicate axis

在尝试对索引中具有重复值的 DataFrame 进行 groupby/rank 时发生.您可以通过构造 s 在追加后具有唯一索引值来避免该问题:

occurs when trying to groupby/rank on a DataFrame with duplicate values in the index. You can avoid the problem by constructing s to have unique index values after appending:

s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return']) b = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return']) s = s.append(b, ignore_index=True)

产量

Year Manager Return 0 2012 A 3 1 2012 B 8 2 2011 A 20 3 2011 B 30 4 2012 A 3 5 2012 B 8 6 2011 A 20 7 2011 B 30

<小时>
如果您已经使用

If you've already appended new rows using

s = s.append(b)

然后使用 reset_index 创建唯一索引:

then use reset_index to create a unique index:

s = s.reset_index(drop=True)

相关文章