从枢轴绘制 Pandas DataFrame

2022-01-22 00:00:00 python pandas pivot matplotlib

问题描述

我正在尝试在 Jupyter Notebook 中使用 Pandas 绘制一个比较特定州在 1960-1962 年间的谋杀率的折线图.

关于我现在在哪里以及我是如何到达这里的一些背景信息:

我正在使用犯罪 csv 文件，如下所示:

我目前只对 3 个栏目感兴趣:州、年份和谋杀率.具体来说，我只对 5 个州感兴趣——阿拉斯加、密歇根、明尼苏达、缅因、威斯康星.

为了生成所需的表格，我这样做了(仅显示前 5 行条目):

al_mi_mn_me_wi = 犯罪[(crimes['State'] == 'Alaska') |(犯罪['州'] =='密歇根')|(犯罪['州'] =='明尼苏达')|(犯罪['州'] =='缅因州')|(犯罪['州'] =='威斯康星州')]control_df = al_mi_mn_me_wi[['状态', '年份', '谋杀率']]

从这里我使用了 pivot 功能

df = control_1960_to_1962.pivot(index = 'Year', columns = 'State',values= 'Murder Rate' )

这就是我卡住的地方.我在做的时候收到了 KeyError(KeyError 是年份):

df.plot(x='Year', y='Murder Rate', kind='line')

当尝试时

df.plot()

我得到了这个不稳定的图表.

如何获得我想要的图表?

解决方案

给定一个长(整齐)格式的数据帧，pandas.DataFrame.pivot 用于转换为宽格式，即可以直接用 pandas.DataFrame.plot

绘制

在 python 3.8.11、pandas 1.3.3、matplotlib 3.4.3

将 numpy 导入为 np将熊猫导入为 pdcontrol_1960_to_1962 = pd.DataFrame({'州': np.repeat(['阿拉斯加', '缅因州', '密歇根州', '明尼苏达州', '威斯康星州'], 3),‘年份’:[1960, 1961, 1962]*5,谋杀率":[10.2、11.5、4.5、1.7、1.6、1.4、4.5、4.1、3.4、1.2、1.0、.9、1.3、1.6、.9]})df = control_1960_to_1962.pivot(index='Year', columns='State', values='Murder Rate')# 显示(df)阿拉斯加州缅因州密歇根州明尼苏达威斯康星州年1960 10.2 1.7 4.5 1.2 1.31961 11.5 1.6 4.1 1.0 1.61962 4.5 1.4 3.4 0.9 0.9

地块

您可以明确告诉 Pandas(并通过它实际执行绘图的 matplotlib 包)您想要的 xticks:

ax = df.plot(xticks=df.index, ylabel='谋杀率')

输出:

ax 是

I am trying to plot a line graph comparing the Murder Rates of particular States through the years 1960-1962 using Pandas in a Jupyter Notebook.

A little context about where I am now, and how I arrived here:

I'm using a crime csv file, which looks like this:

I'm only interested in 3 columns for the time being: State, Year, and Murder Rate. Specifically I was interested in only 5 states - Alaska, Michigan, Minnesota, Maine, Wisconsin.

So to produce the desired table, I did this (only showing top 5 row entries):

al_mi_mn_me_wi = crimes[(crimes['State'] == 'Alaska') | (crimes['State'] =='Michigan') | (crimes['State'] =='Minnesota') | (crimes['State'] =='Maine') | (crimes['State'] =='Wisconsin')]
control_df = al_mi_mn_me_wi[['State', 'Year', 'Murder Rate']]

From here I used the pivot function

df = control_1960_to_1962.pivot(index = 'Year', columns = 'State',values= 'Murder Rate' )

And this is where I get stuck. I received KeyError when doing (KeyError was Year):

df.plot(x='Year', y='Murder Rate', kind='line')

and when attempting just

df.plot()

I get this wonky graph.

How do I get my desired graph?

解决方案

Given a dataframe in a long (tidy) format, pandas.DataFrame.pivot is used to transform to a wide format, which can be plotted directly with pandas.DataFrame.plot

Tested in python 3.8.11, pandas 1.3.3, matplotlib 3.4.3

import numpy as np
import pandas as pd

control_1960_to_1962 = pd.DataFrame({
    'State': np.repeat(['Alaska', 'Maine', 'Michigan', 'Minnesota', 'Wisconsin'], 3),
    'Year': [1960, 1961, 1962]*5,
    'Murder Rate': [10.2, 11.5, 4.5, 1.7, 1.6, 1.4, 4.5, 4.1, 3.4, 1.2, 1.0, .9, 1.3, 1.6, .9]
})

df = control_1960_to_1962.pivot(index='Year', columns='State', values='Murder Rate')

# display(df)
State  Alaska  Maine  Michigan  Minnesota  Wisconsin
Year                                                
1960     10.2    1.7       4.5        1.2        1.3
1961     11.5    1.6       4.1        1.0        1.6
1962      4.5    1.4       3.4        0.9        0.9

The plots

You can tell Pandas (and through it the matplotlib package that actually does the plotting) what xticks you want explicitly:

ax = df.plot(xticks=df.index, ylabel='Murder Rate')

Output:

ax is a matplotlib.axes.Axes object, and there are many, many customizations you can make to your plot through it.

Here's how to plot with the States on the x axis:

ax = df.T.plot(kind='bar', ylabel='Murder Rate')

Output:

相关文章