获取给定次数内具有超过特定值的列的值

2022-03-02 00:00:00 python pandas dataframe data-science

问题描述

我有一个数据帧dfAS:

  Election Year     Votes   Votes %     Party              Region   
0   2000            42289   29.40   Janata Dal (United)     A
1   2000            27618   19.20   Rashtriya Janata Dal    B
2   2000            20886   14.50   Bahujan Samaj Party     C 
3   2000            17747   12.40   Congress                D
4   2000            14047   9.80    Independent             E
5   2005            8358    5.80    Janvadi Party           A
6   2005            4428    13.10   Independent             B
7   2005            1647    1.20    Independent             C
8   2005            1610    11.10   Independent             D
9   2005            1334    15.06   Nationalist  Party      E
10  2010            1114    0.80    Independent             A
11  2010            1042    10.5    Bharatiya Janta Dal     B
12  2010            835     0.60    Independent             C
13  2010            14305   15.50   Independent             D
14  2010            22211   17.70   Congress                E

我需要找到其中3个或更多政党在每个";选举年获得10%以上投票权的";地区";。

我已将选举年份按升序排序,投票百分比按降序排序:

 df1 = df.sort_values(['Election Year','Votes %'], ascending = (True, False))

然后我拿到了每个地区的前3名:

top_3 = df1.groupby(['Election Year', 'Region']).head(3).reset_index()

现在如何查看前3个地区每年是否有10%或更多的选票?


解决方案

您在找这样的东西吗?

def election(df):
    count = df['Votes %'].gt(10).sum()
    regions = ','.join(df['Region'].where(df['Votes %'].gt(10),'None').tolist())
    return pd.Series({'count':count,'regions':regions})
ndf = df.groupby(['Election Year','Party']).apply(election)
ndf = ndf.replace(['None,','None'],'',regex=True)

相关文章