Plotly-Express:如何在按列名设置颜色时修复颜色映射
问题描述
我正在使用 plotly express
作为散点图.标记的颜色由我的数据框的变量定义,如下例所示.
将 pandas 导入为 pd将 numpy 导入为 np将 plotly.express 导入为 pxdf = px.data.iris()fig = px.scatter(df[df.species.isin(['virginica', 'setosa'])], x="sepal_width", y="sepal_length", color="species")图.show()
当我添加此变量的另一个实例时,颜色映射会发生变化(首先,'virginica' 是红色,然后是绿色).
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species",size='petal_length', hover_data=['petal_width'])图.show()
添加变量时如何保持颜色的映射关系?
解决方案简答:
1. 使用 color_discrete_map
为变量分配颜色:
color_discrete_map = {'virginica': 'blue', 'setosa': 'red', 'versicolor': 'green'}
或:
2. 管理数据的顺序以启用正确的颜色循环:
order_df(df_input = df, order_by='species', order=['virginica', 'setosa', 'versicolor'])
... 其中 order_df
是一个处理长数据帧排序的函数,您可以在下面的代码片段中找到完整的定义.
详情:
1.您可以直接
情节 2: ['virginica', 'setosa']
情节 3: order=['virginica', 'setosa', 'versicolor']
完整代码:
# 导入将熊猫导入为 pd将 plotly.express 导入为 px# 数据df = px.data.iris()# 对熊猫进行子集化和排序的函数# 长格式的数据帧def order_df(df_input, order_by, order):df_output=pd.DataFrame()对于 var 按顺序:df_append=df_input[df_input[order_by]==var].copy()df_output = pd.concat([df_output, df_append])返回(df_output)# 数据子集df_express = order_df(df_input = df, order_by='species', order=['virginica'])df_express = order_df(df_input = df, order_by='species', order=['virginica', 'setosa'])df_express = order_df(df_input = df, order_by='species', order=['virginica', 'setosa', 'versicolor'])# 情节fig = px.scatter(df_express, x="sepal_width", y="sepal_length", color="species")图.show()
I am using plotly express
for a scatter plot. The color of the markers is defined by a variable of my dataframe, as in the example below.
import pandas as pd
import numpy as np
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df[df.species.isin(['virginica', 'setosa'])], x="sepal_width", y="sepal_length", color="species")
fig.show()
When I add another instance of this variable, the color mapping changes (First, 'virginica', is red, then green).
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species",size='petal_length', hover_data=['petal_width'])
fig.show()
How can I keep the mapping of the colors when adding variables?
解决方案Short answer:
1. Assign colors to variables with color_discrete_map
:
color_discrete_map = {'virginica': 'blue', 'setosa': 'red', 'versicolor': 'green'}
or:
2. Manage the order of your data to enable the correct color cycle with:
order_df(df_input = df, order_by='species', order=['virginica', 'setosa', 'versicolor'])
... where order_df
is a function that handles the ordering of long dataframes for which you'll find the complete definition in the code snippets below.
The details:
1. You can map colors to variables directly with:
color_discrete_map = {'virginica': 'blue', 'setosa': 'red', 'versicolor': 'green'}
The downside is that you'll have to specify variable names and colors. And that quickly becomes tedious if you're working with dataframes where the number of variables is not fixed. In which case it would be much more convenient to follow the default color sequence or specify one to your liking. So I would rather consider managing the order of your dataset so that you'll get the desired colormatching.
2. The source of the real challenge:
px.Scatter()
will assign color to variable in the order they appear in your dataframe. Here you're using two different sourcesdf
and df[df.species.isin(['virginica', 'setosa', 'versicolor'])]
(let's name the latter df2
). Running df2['species'].unique()
will give you:
array(['setosa', 'virginica'], dtype=object)
And running df['species']
will give you:
array(['setosa', 'versicolor', 'virginica'], dtype=object)
See that versicolor
pops up in the middle? Thats's why red
is no longer assigned to 'virginica'
, but 'versicolor'
instead.
Suggested solution:
So in order to build a complete solution, you'd have to find a way to specify the order of the variables in the source dataframe. Thats very straight forward for a column with unique values. It's a bit more work for a dataframe of a long format such as this. You could do it as described in the post Changing row order in pandas dataframe without losing or messing up data. But below I've put together a very easy function that takes care of both the subset and the order of the dataframe you'd like to plot with plotly express.
Using the complete code and switching between the lines under # data subsets
will give you the three following plots:
Plot 1: order=['virginica']
Plot 2: ['virginica', 'setosa']
Plot 3: order=['virginica', 'setosa', 'versicolor']
Complete code:
# imports
import pandas as pd
import plotly.express as px
# data
df = px.data.iris()
# function to subset and order a pandas
# dataframe fo a long format
def order_df(df_input, order_by, order):
df_output=pd.DataFrame()
for var in order:
df_append=df_input[df_input[order_by]==var].copy()
df_output = pd.concat([df_output, df_append])
return(df_output)
# data subsets
df_express = order_df(df_input = df, order_by='species', order=['virginica'])
df_express = order_df(df_input = df, order_by='species', order=['virginica', 'setosa'])
df_express = order_df(df_input = df, order_by='species', order=['virginica', 'setosa', 'versicolor'])
# plotly
fig = px.scatter(df_express, x="sepal_width", y="sepal_length", color="species")
fig.show()
相关文章