Python用线性插值正则化不规则时间序列
问题描述
我在 pandas 中有一个时间序列,如下所示:
I have a time series in pandas that looks like this:
Values
1992-08-27 07:46:48 28.0
1992-08-27 08:00:48 28.2
1992-08-27 08:33:48 28.4
1992-08-27 08:43:48 28.8
1992-08-27 08:48:48 29.0
1992-08-27 08:51:48 29.2
1992-08-27 08:53:48 29.6
1992-08-27 08:56:48 29.8
1992-08-27 09:03:48 30.0
我想将其重新采样为具有 15 分钟步长的常规时间序列,其中值是线性插值的.基本上我想得到:
I would like to resample it to a regular time series with 15 min times steps where the values are linearly interpolated. Basically I would like to get:
Values
1992-08-27 08:00:00 28.2
1992-08-27 08:15:00 28.3
1992-08-27 08:30:00 28.4
1992-08-27 08:45:00 28.8
1992-08-27 09:00:00 29.9
但是使用 Pandas 的重采样方法 (df.resample('15Min')) 我得到:
However using the resample method (df.resample('15Min')) from Pandas I get:
Values
1992-08-27 08:00:00 28.20
1992-08-27 08:15:00 NaN
1992-08-27 08:30:00 28.60
1992-08-27 08:45:00 29.40
1992-08-27 09:00:00 30.00
我尝试过使用不同的how"和fill_method"参数的重采样方法,但从未得到我想要的结果.我是不是用错了方法?
I have tried the resample method with different 'how' and 'fill_method' parameters but never got exactly the results I wanted. Am I using the wrong method?
我认为这是一个相当简单的查询,但我已经在网上搜索了一段时间并没有找到答案.
I figure this is a fairly simple query, but I have searched the web for a while and couldn't find an answer.
提前感谢我能得到的任何帮助.
Thanks in advance for any help I can get.
解决方案
这需要一些工作,但试试这个.基本思想是找到最接近每个重采样点的两个时间戳并进行插值.np.searchsorted
用于查找最接近重采样点的日期.
It takes a bit of work, but try this out. Basic idea is find the closest two timestamps to each resample point and interpolate. np.searchsorted
is used to find dates closest to the resample point.
# empty frame with desired index
rs = pd.DataFrame(index=df.resample('15min').iloc[1:].index)
# array of indexes corresponding with closest timestamp after resample
idx_after = np.searchsorted(df.index.values, rs.index.values)
# values and timestamp before/after resample
rs['after'] = df.loc[df.index[idx_after], 'Values'].values
rs['before'] = df.loc[df.index[idx_after - 1], 'Values'].values
rs['after_time'] = df.index[idx_after]
rs['before_time'] = df.index[idx_after - 1]
#calculate new weighted value
rs['span'] = (rs['after_time'] - rs['before_time'])
rs['after_weight'] = (rs['after_time'] - rs.index) / rs['span']
# I got errors here unless I turn the index to a series
rs['before_weight'] = (pd.Series(data=rs.index, index=rs.index) - rs['before_time']) / rs['span']
rs['Values'] = rs.eval('before * before_weight + after * after_weight')
毕竟,希望是正确的答案:
After all that, hopefully the right answer:
In [161]: rs['Values']
Out[161]:
1992-08-27 08:00:00 28.011429
1992-08-27 08:15:00 28.313939
1992-08-27 08:30:00 28.223030
1992-08-27 08:45:00 28.952000
1992-08-27 09:00:00 29.908571
Freq: 15T, Name: Values, dtype: float64
相关文章