如何按天拆分 pandas 数据框或系列(可能使用迭代器)

2022-01-11 00:00:00 python pandas indexing time-series

问题描述

我有很长的时间序列，例如.

I have a long time series, eg.

import pandas as pd index=pd.date_range(start='2012-11-05', end='2012-11-10', freq='1S').tz_localize('Europe/Berlin') df=pd.DataFrame(range(len(index)), index=index, columns=['Number'])

现在我想提取每天的所有子数据帧，以获得以下输出:

Now I want to extract all sub-DataFrames for each day, to get the following output:

df_2012-11-05: data frame with all data referring to day 2012-11-05 df_2012-11-06: etc. df_2012-11-07 df_2012-11-08 df_2012-11-09 df_2012-11-10

避免检查 index.date==give_date 是否非常慢的最有效方法是什么.此外，用户事先并不知道帧中的天数范围.

What is the most effective way to do this avoiding to check if the index.date==give_date which is very slow. Also, the user does not know a priory the range of days in the frame.

有什么提示可以用迭代器做到这一点吗?

Any hint do do this with an iterator?

我目前的解决方案是这样，但它不是那么优雅，并且有两个问题定义如下:

My current solution is this, but it is not so elegant and has two issues defined below:

time_zone='Europe/Berlin' # find all days a=np.unique(df.index.date) # this can take a lot of time a.sort() results=[] for i in range(len(a)-1): day_now=pd.Timestamp(a[i]).tz_localize(time_zone) day_next=pd.Timestamp(a[i+1]).tz_localize(time_zone) results.append(df[day_now:day_next]) # how to select if I do not want day_next included? # last day results.append(df[day_next:])

这种方法存在以下问题:

This approach has the following problems:

a=np.unique(df.index.date) 可能需要很长时间
df[day_now:day_next] 包含 day_next，但我需要在范围内排除它

解决方案

也许是groupby?

Perhaps groupby?

DFList = [] for group in df.groupby(df.index.day): DFList.append(group[1])

应该给你一个数据框列表，其中每个数据框是一天的数据.

Should give you a list of data frames where each data frame is one day of data.

或者在一行中:

DFList = [group[1] for group in df.groupby(df.index.day)]

一定要爱上蟒蛇！

相关文章