在 Pandas 数据框列中填充缺失的日期值
问题描述
我正在使用 Pandas 使用 Data Frames 存储股票价格数据.数据集中有 2940 行.数据集快照如下所示:
I'm using Pandas to store stock prices data using Data Frames. There are 2940 rows in the dataset. The Dataset snapshot is displayed below:
时间序列数据不包含周六和周日的值.因此,必须填充缺失值.
这是我写的代码,但没有解决问题:
The time series data does not contain the values for Saturday and Sunday. Hence missing values have to be filled.
Here is the code I've written but it is not solving the problem:
import pandas as pd
import numpy as np
import os
os.chdir('C:/Users/Admin/Analytics/stock-prices')
data = pd.read_csv('stock-data.csv')
# PriceDate Column - Does not contain Saturday and Sunday stock entries
data['PriceDate'] = pd.to_datetime(data['PriceDate'], format='%m/%d/%Y')
data = data.sort_index(by=['PriceDate'], ascending=[True])
# Starting date is Aug 25 2004
idx = pd.date_range('08-25-2004',periods=2940,freq='D')
data = data.set_index(idx)
data['newdate']=data.index
newdate=data['newdate'].values # Create a time series column
data = pd.merge(newdate, data, on='PriceDate', how='outer')
如何填补周六周日的缺失值?
解决方案
我觉得你可以使用 resample
与 ffill
或 bfill
,但在 set_index
来自 PriceDate
列:
I think you can use resample
with ffill
or bfill
, but before set_index
from column PriceDate
:
print (data)
ID PriceDate OpenPrice HighPrice
0 1 6/24/2016 1 2
1 2 6/23/2016 3 4
2 2 6/22/2016 5 6
3 2 6/21/2016 7 8
4 2 6/20/2016 9 10
5 2 6/17/2016 11 12
6 2 6/16/2016 13 14
data['PriceDate'] = pd.to_datetime(data['PriceDate'], format='%m/%d/%Y')
data = data.sort_values(by=['PriceDate'], ascending=[True])
data.set_index('PriceDate', inplace=True)
print (data)
ID OpenPrice HighPrice
PriceDate
2016-06-16 2 13 14
2016-06-17 2 11 12
2016-06-20 2 9 10
2016-06-21 2 7 8
2016-06-22 2 5 6
2016-06-23 2 3 4
2016-06-24 1 1 2
data = data.resample('D').ffill().reset_index()
print (data)
PriceDate ID OpenPrice HighPrice
0 2016-06-16 2 13 14
1 2016-06-17 2 11 12
2 2016-06-18 2 11 12
3 2016-06-19 2 11 12
4 2016-06-20 2 9 10
5 2016-06-21 2 7 8
6 2016-06-22 2 5 6
7 2016-06-23 2 3 4
8 2016-06-24 1 1 2
<小时>
data = data.resample('D').bfill().reset_index()
print (data)
PriceDate ID OpenPrice HighPrice
0 2016-06-16 2 13 14
1 2016-06-17 2 11 12
2 2016-06-18 2 9 10
3 2016-06-19 2 9 10
4 2016-06-20 2 9 10
5 2016-06-21 2 7 8
6 2016-06-22 2 5 6
7 2016-06-23 2 3 4
8 2016-06-24 1 1 2
相关文章