Pandas 数据框中的 MultiIndex Group By

2022-01-21 00:00:00 python pandas dataframe dataset

问题描述

我有一个数据集，其中包含按年份划分的国家和经济指标统计数据，组织方式如下:

I have a data set that contains countries and statistics on economic indicators by year, organized like so:

Country Metric 2011 2012 2013 2014 USA GDP 7 4 0 2 USA Pop. 2 3 0 3 GB GDP 8 7 0 7 GB Pop. 2 6 0 0 FR GDP 5 0 0 1 FR Pop. 1 1 0 5

如何在 pandas 中使用 MultiIndex 来创建仅按年份显示每个国家/地区的 GDP 的数据框?

How can I use MultiIndex in pandas to create a data frame that only shows GDP by Year for each country?

我试过了:

df = data.groupby(['Country', 'Metric'])

但它不能正常工作.

解决方案

在这种情况下，您实际上并不需要 groupby.你也没有MultiIndex.你可以做一个这样的:

In this case, you don't actually need a groupby. You also don't have a MultiIndex. You can make one like this:

import pandas from io import StringIO datastring = StringIO(""" Country Metric 2011 2012 2013 2014 USA GDP 7 4 0 2 USA Pop. 2 3 0 3 GB GDP 8 7 0 7 GB Pop. 2 6 0 0 FR GDP 5 0 0 1 FR Pop. 1 1 0 5 """) data = pandas.read_table(datastring, sep='ss+') data.set_index(['Country', 'Metric'], inplace=True)

那么 data 是这样的:

2011 2012 2013 2014 Country Metric USA GDP 7 4 0 2 Pop. 2 3 0 3 GB GDP 8 7 0 7 Pop. 2 6 0 0 FR GDP 5 0 0 1 Pop. 1 1 0 5

现在要获取 GDP，您可以通过 xs 方法获取数据框的横截面:

Now to get the GDPs, you can take a cross-section of the dataframe via the xs method:

data.xs('GDP', level='Metric') 2011 2012 2013 2014 Country USA 7 4 0 2 GB 8 7 0 7 FR 5 0 0 1

这很容易，因为您的数据已经旋转/取消堆叠.如果他们不是并且看起来像这样:

It's so easy because your data are already pivoted/unstacked. IF they weren't and looked like this:

data.columns.names = ['Year'] data = data.stack() data Country Metric Year USA GDP 2011 7 2012 4 2013 0 2014 2 Pop. 2011 2 2012 3 2013 0 2014 3 GB GDP 2011 8 2012 7 2013 0 2014 7 Pop. 2011 2 2012 6 2013 0 2014 0 FR GDP 2011 5 2012 0 2013 0 2014 1 Pop. 2011 1 2012 1 2013 0 2014 5

然后您可以使用 groupby 告诉您有关整个世界的一些信息:

You could then use groupby to tell you something about the world as a whole:

data.groupby(level=['Metric', 'Year']).sum() Metric Year GDP 2011 20 2012 11 2013 0 2014 10 Pop. 2011 5 2012 10 2013 0 2014 8

或者得到真正的幻想:

data.groupby(level=['Metric', 'Year']).sum().unstack(level='Metric') Metric GDP Pop. Year 2011 20 5 2012 11 10 2013 0 0 2014 10 8

相关文章