使用 Python/Pandas 提取时间序列中的工作日
问题描述
我正在处理时间序列中的高频数据,我想从我的数据中获取所有工作日.我的数据观察以秒为单位,所以每天有 86400 秒,我的数据集分布在 31 天(所以有 2,678,400 个观察!).
I am working with high frequency data in Time Series and I would like to get all the business days from my data. My data observations are separated by seconds, so there are 86400 seconds each day and my data set are spread over 31 days (so there are 2,678,400 observations!).
这是我的(部分)数据:
Here is (part) of my data:
In[1]: ts
Out[1]:
2013-01-01 00:00:00 0.480928
2013-01-01 00:00:01 0.480928
2013-01-01 00:00:02 0.483977
2013-01-01 00:00:03 0.486725
2013-01-01 00:00:04 0.486725
...
2013-01-31 23:59:56 0.451630
2013-01-31 23:59:57 0.451630
2013-01-31 23:59:58 0.451630
2013-01-31 23:59:59 0.454683
Freq: S, Length: 2678400
我想做的是创建一个新的时间序列,其中包含本月的工作日,但我希望它们具有相应的数据秒数.例如,如果 2013 年 1 月 2 日(星期三)到 2013 年 1 月 4 日(星期五)是 1 月第一周的第一个工作日,那么:
What I would like to do is to create a new time series which consists of the business days from this month, but I would like to have them with their corresponding data seconds. For example, if 2013-01-02 (WED) until 2013-01-04 (Fri) are the first business days of the first week in January, then:
2013-01-02 00:00:00 0.507477
2013-01-02 00:00:01 0.501373
...
2013-01-03 00:00:00 0.489778
2013-01-03 00:00:01 0.489778
...
2013-01-04 23:59:58 0.598115
2013-01-04 23:59:59 0.598115
Freq: S, Length: 259200
因此它当然会排除 2013 年 1 月 5 日和 2013 年 1 月 6 日星期六的所有数据,因为这些是周末.等等……
so it will exclude of course all the data on Sat 2013-01-05 and 2013-01-06 since these are the weekend days. and so on...
我尝试使用一些 pandas 内置命令,但找不到合适的命令,因为它们按天聚合,而没有考虑到每一天都包含子列.也就是说,每一秒都有一个值,它们不应该被平均,只是组合在一起形成一个新的系列..
I tried to use some pandas built-in commands, but couldn't find the right one since they aggregate by day without taking into consideration that each day contains sub columns in them. That is, for each second there is a value and they should not be averaged, just grouped together to a new series..
例如我试过:
ts.asfreq(BDay())
--> 查找工作日,但每天取平均值ts.resample()
--> 你必须定义'how' (mean, max, min...)ts.groupby(lambda x : x.weekday)
--> 也不是!ts = pd.Series(df, index = pd.bdate_range(start = '2013/01/01 00:00:00', end = '2013/01/31 23:59:59',频率 = 'S'))
--> df 因为原始数据是 DataFramem.使用 pd.bdate_range 并没有帮助,因为 df 和 index 必须在同一维度中..
ts.asfreq(BDay())
--> finds the business day but averages over each dayts.resample()
--> you must define 'how' (mean, max, min...)ts.groupby(lambda x : x.weekday)
--> not either!ts = pd.Series(df, index = pd.bdate_range(start = '2013/01/01 00:00:00', end = '2013/01/31 23:59:59' , freq = 'S'))
--> df since the original data is as DataFramem. Using pd.bdate_range didn't help since df and index must be in the same dimension..
我在 pandas 文档中搜索,谷歌搜索但找不到线索...
有人有想法吗?
I searched in pandas documentation, googled but could not find a clue...
Does anybody have an idea?
非常感谢您的帮助!
谢谢!
附言我宁愿不使用循环,因为我的数据集非常大......(我还有其他月份要分析)
p.s I would rather not use loops for that, since my data set is very large... (I have also other months to analyse)
解决方案
不幸的是这有点慢,但至少应该给出你正在寻找的答案.
Unfortunately this is a little slow, but should at least give the answer you are looking for.
#create an index of just the date portion of your index (this is the slow step)
ts_days = pd.to_datetime(ts.index.date)
#create a range of business days over that period
bdays = pd.bdate_range(start=ts.index[0].date(), end=ts.index[-1].date())
#Filter the series to just those days contained in the business day range.
ts = ts[ts_days.isin(bdays)]
相关文章