pandas 中的时间序列箱线图

2022-01-11 00:00:00 python pandas time-series boxplot

问题描述

如何为 pandas 时间序列创建箱线图,其中我每天都有一个方框?

How can I create a boxplot for a pandas time-series where I have a box for each day?

每小时数据的示例数据集,其中一个框应包含 24 个值:

Sample dataset of hourly data where one box should consist of 24 values:

import pandas as pd
n = 480
ts = pd.Series(randn(n),
               index=pd.date_range(start="2014-02-01",
                                   periods=n,
                                   freq="H"))
ts.plot()

我知道我可以为当天制作一个额外的专栏,但我希望有适当的 x 轴标签和 x 限制功能(如在 ts.plot() 中),所以能够使用日期时间索引会很棒.

I am aware that I could make an extra column for the day, but I would like to have proper x-axis labeling and x-limit functionality (like in ts.plot()), so being able to work with the datetime index would be great.

R/ggplot2 有一个类似的问题这里,如果它有助于澄清我想要什么.

There is a similar question for R/ggplot2 here, if it helps to clarify what I want.


解决方案

如果它适合你,我建议使用 Seaborn,它是 Matplotlib 的包装器.您可以通过循环遍历时间序列中的组来自己完成,但这需要更多的工作.

If its an option for you, i would recommend using Seaborn, which is a wrapper for Matplotlib. You could do it yourself by looping over the groups from your timeseries, but that's much more work.

import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt

n = 480
ts = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))


fig, ax = plt.subplots(figsize=(12,5))
seaborn.boxplot(ts.index.dayofyear, ts, ax=ax)

这给出了:

请注意,我将 day of year 作为 grouper 传递给 seaborn,如果您的数据跨越多年,这将不起作用.然后你可以考虑这样的事情:

Note that i'm passing the day of year as the grouper to seaborn, if your data spans multiple years this wouldn't work. You could then consider something like:

ts.index.to_series().apply(lambda x: x.strftime('%Y%m%d'))

编辑,对于 3 小时,您可以将其用作石斑鱼,但它仅在没有定义分钟或更低的情况下才有效.:

Edit, for 3-hourly you could use this as a grouper, but it only works if there are no minutes or lower defined. :

[(dt - datetime.timedelta(hours=int(dt.hour % 3))).strftime('%Y%m%d%H') for dt in ts.index]

相关文章