Plotly:如何使用长格式或宽格式的 pandas 数据框制作线图?

2022-01-21 00:00:00 python plotly plotly-python

问题描述

(这是一篇自我回答的帖子,通过不必解释 plotly 如何最好地处理长格式和宽格式数据来帮助其他人缩短他们对 plotly 问题的答案)

<小时>

我想在尽可能少的行中基于 pandas 数据框构建一个绘图图.我知道你可以使用 plotly.express 来做到这一点,但这对于我称之为标准 pandas 数据框的方法来说是失败的;描述行顺序的索引,以及描述数据框中值名称的列名:

示例数据框:

 a b c0 100.000000 100.000000 100.0000001 98.493705 99.421400 101.6514372 96.067026 98.992487 102.9173733 95.200286 98.313601 102.8226644 96.691675 97.674699 102.378682

尝试:

fig=px.line(x=df.index, y = df.columns)

这会引发错误:

<块引用>

ValueError:所有参数的长度应该相同.参数 y 的长度是 3,而前面的参数 ['x'] 的长度是 100`

解决方案

在这里,您尝试使用宽格式的 pandas 数据框作为 px.line 的源.plotly.express 旨在与

如何使用px绘制长数据?

fig = px.line(df, x='id', y='value', color='variable')

如何使用 go 绘制宽数据?

colors = px.colors.qualitative.Plotlyfig = go.Figure()fig.add_traces(go.Scatter(x=df['id'], y = df['a'], mode = 'lines', line=dict(color=colors[0])))fig.add_traces(go.Scatter(x=df['id'], y = df['b'], mode = 'lines', line=dict(color=colors[1])))fig.add_traces(go.Scatter(x=df['id'], y = df['c'], mode = 'lines', line=dict(color=colors[2])))图.show()

从表面上看,go 更复杂,或许更灵活?嗯,是.和不.您可以使用 px 轻松构建图形并添加您想要的任何 go 对象!

完整的 go 代码段:

将 numpy 导入为 np将熊猫导入为 pd将 plotly.express 导入为 px导入 plotly.graph_objects# 宽格式数据帧np.random.seed(123)X = np.random.randn(100,3)df=pd.DataFrame(X, columns=['a','b','c'])df=df.cumsum()df['id']=df.index# plotly.graph_objects颜色 = px.colors.qualitative.Plotlyfig = go.Figure()fig.add_traces(go.Scatter(x=df['id'], y = df['a'], mode = 'lines', line=dict(color=colors[0])))fig.add_traces(go.Scatter(x=df['id'], y = df['b'], mode = 'lines', line=dict(color=colors[1])))fig.add_traces(go.Scatter(x=df['id'], y = df['c'], mode = 'lines', line=dict(color=colors[2])))图.show()

完整的像素片段:

将 numpy 导入为 np将熊猫导入为 pd将 plotly.express 导入为 px从 plotly.offline 导入 iplot# 宽格式数据帧np.random.seed(123)X = np.random.randn(100,3)df=pd.DataFrame(X, columns=['a','b','c'])df=df.cumsum()df['id']=df.index# 长格式数据帧df = pd.melt(df, id_vars='id', value_vars=df.columns[:-1])# 情节表达fig = px.line(df, x='id', y='值', color='变量')图.show()

(This is a self-answered post to help others shorten their answers to plotly questions by not having to explain how plotly best handles data of long and wide format)


I'd like to build a plotly figure based on a pandas dataframe in as few lines as possible. I know you can do that using plotly.express, but this fails for what I would call a standard pandas dataframe; an index describing row order, and column names describing the names of a value in a dataframe:

Sample dataframe:

    a           b           c
0   100.000000  100.000000  100.000000
1   98.493705   99.421400   101.651437
2   96.067026   98.992487   102.917373
3   95.200286   98.313601   102.822664
4   96.691675   97.674699   102.378682

An attempt:

fig=px.line(x=df.index, y = df.columns)

This raises an error:

ValueError: All arguments should have the same length. The length of argument y is 3, whereas the length of previous arguments ['x'] is 100`

解决方案

Here you've tried to use a pandas dataframe of a wide format as a source for px.line. And plotly.express is designed to be used with dataframes of a long format, often referred to as tidy data (and please take a look at that. No one explains it better that Wickham). Many, particularly those injured by years of battling with Excel, often find it easier to organize data in a wide format. So what's the difference?

Wide format:

  • data is presented with each different data variable in a separate column
  • each column has only one data type
  • missing values are often represented by np.nan
  • works best with plotly.graphobjects (go)
  • lines are often added to a figure using fid.add_traces()
  • colors are normally assigned to each trace

Example:

            a          b           c
0   -1.085631    0.997345   0.282978
1   -2.591925    0.418745   1.934415
2   -5.018605   -0.010167   3.200351
3   -5.885345   -0.689054   3.105642
4   -4.393955   -1.327956   2.661660
5   -4.828307    0.877975   4.848446
6   -3.824253    1.264161   5.585815
7   -2.333521    0.328327   6.761644
8   -3.587401   -0.309424   7.668749
9   -5.016082   -0.449493   6.806994

Long format:

  • data is presented with one column containing all the values and another column listing the context of the value
  • missing values are simply not included in the dataset.
  • works best with plotly.express (px)
  • colors are set by a default color cycle and are assigned to each unique variable

Example:

    id  variable    value
0   0   a        -1.085631
1   1   a        -2.591925
2   2   a        -5.018605
3   3   a        -5.885345
4   4   a        -4.393955
... ... ... ...
295 95  c        -4.259035
296 96  c        -5.333802
297 97  c        -6.211415
298 98  c        -4.335615
299 99  c        -3.515854

How to go from wide to long?

df = pd.melt(df, id_vars='id', value_vars=df.columns[:-1])

The two snippets below will produce the very same plot:

How to use px to plot long data?

fig = px.line(df, x='id', y='value', color='variable')

How to use go to plot wide data?

colors = px.colors.qualitative.Plotly
fig = go.Figure()
fig.add_traces(go.Scatter(x=df['id'], y = df['a'], mode = 'lines', line=dict(color=colors[0])))
fig.add_traces(go.Scatter(x=df['id'], y = df['b'], mode = 'lines', line=dict(color=colors[1])))
fig.add_traces(go.Scatter(x=df['id'], y = df['c'], mode = 'lines', line=dict(color=colors[2])))
fig.show()

By the looks of it, go is more complicated and offers perhaps more flexibility? Well, yes. And no. You can easily build a figure using px and add any go object you'd like!

Complete go snippet:

import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

# dataframe of a wide format
np.random.seed(123)
X = np.random.randn(100,3)  
df=pd.DataFrame(X, columns=['a','b','c'])
df=df.cumsum()
df['id']=df.index

# plotly.graph_objects
colors = px.colors.qualitative.Plotly
fig = go.Figure()
fig.add_traces(go.Scatter(x=df['id'], y = df['a'], mode = 'lines', line=dict(color=colors[0])))
fig.add_traces(go.Scatter(x=df['id'], y = df['b'], mode = 'lines', line=dict(color=colors[1])))
fig.add_traces(go.Scatter(x=df['id'], y = df['c'], mode = 'lines', line=dict(color=colors[2])))
fig.show()

Complete px snippet:

import numpy as np
import pandas as pd
import plotly.express as px
from plotly.offline import iplot

# dataframe of a wide format
np.random.seed(123)
X = np.random.randn(100,3)  
df=pd.DataFrame(X, columns=['a','b','c'])
df=df.cumsum()
df['id']=df.index

# dataframe of a long format
df = pd.melt(df, id_vars='id', value_vars=df.columns[:-1])

# plotly express
fig = px.line(df, x='id', y='value', color='variable')
fig.show()

相关文章