Keras LSTM:时间序列多步多特征预测 - 结果不佳
问题描述
我有一个时间序列数据集,其中包含一整年的数据(日期是索引).每 15 分钟(全年)测量一次数据,每天测量 96 个时间步长.数据已经标准化.变量是相关的.除 VAR 以外的所有变量都是天气测量值.
I have a time series dataset containing data from a whole year (date is the index). The data was measured every 15 min (during whole year) which results in 96 timesteps a day. The data is already normalized. The variables are correlated. All the variables except the VAR are weather measures.
VAR 在一天和一周内是季节性的(因为它在周末看起来有点不同,但每个周末都不太一样).VAR 值是固定的.我想预测接下来两天(提前 192 步)和接下来 7 天(提前 672 步)的 VAR 值.
VAR is seasonal in a day period and in a week period (as it looks a bit different on weekend, but more less the same every weekend). VAR values are stationary. I would like to predict values of VAR for next two days (192 steps ahead) and for next seven days (672 steps ahead).
这是数据集的示例:
DateIdx VAR dewpt hum press temp
2017-04-17 00:00:00 0.369397 0.155039 0.386792 0.196721 0.238889
2017-04-17 00:15:00 0.363214 0.147287 0.429245 0.196721 0.233333
2017-04-17 00:30:00 0.357032 0.139535 0.471698 0.196721 0.227778
2017-04-17 00:45:00 0.323029 0.127907 0.429245 0.204918 0.219444
2017-04-17 01:00:00 0.347759 0.116279 0.386792 0.213115 0.211111
2017-04-17 01:15:00 0.346213 0.127907 0.476415 0.204918 0.169444
2017-04-17 01:30:00 0.259660 0.139535 0.566038 0.196721 0.127778
2017-04-17 01:45:00 0.205564 0.073643 0.523585 0.172131 0.091667
2017-04-17 02:00:00 0.157650 0.007752 0.481132 0.147541 0.055556
2017-04-17 02:15:00 0.122101 0.003876 0.476415 0.122951 0.091667
输入数据集图
我决定在 Keras 中使用 LSTM.有了全年的数据,我使用过去 329 天的数据作为训练数据,其余的数据在训练期间进行验证.train_X -> 包含包括 329 天的 VAR 在内的全部测量值train_Y -> 仅包含 329 天的 VAR.该值向前移动了一步.其余时间步长为 test_X 和 test_Y.
I have decided to use LSTM in Keras. Having data from the whole year, I have used data from past 329 days as a training data and the rest for a validation during the training. train_X -> contains whole measures including VAR from 329 days train_Y -> contains only VAR from 329 days. The value is shifted one step ahead. The rest timesteps goes to test_X and test_Y.
这是我准备train_X和train_Y的代码:
Here is the code I prepare train_X and train_Y:
#X -> is the whole dataframe
#Y -> is a vector of VAR from whole dataframe, already shifted 1 step ahead
#329 * 96 = 31584
train_X = X[:31584]
train_X = train_X.reshape(train_X.shape[0],1,5)
train_Y = Y[:31584]
train_Y = train_Y.reshape(train_Y.shape[0],1)
为了预测下一个 VAR 值,我想使用过去 672 个时间步长(整周度量).为此,我设置了 batch_size=672
,因此fit"命令如下所示:
To predict next VAR value I would like to use past 672 timesteps (whole week measures). For this reason I have set batch_size=672
, so that the ‘fit’ command look like this:
history = model.fit(train_X, train_Y, epochs=50, batch_size=672, validation_data=(test_X, test_Y), shuffle=False)
这是我的网络架构:
model = models.Sequential()
model.add(layers.LSTM(672, input_shape=(None, 672), return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(336, return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(168, return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(84, return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(21, return_sequences=False))
model.add(layers.Dense(1))
model.compile(loss='mae', optimizer='adam')
model.summary()
从下图中我们可以看到,网络在 50 个 epoch 之后学习了一些东西":
From the plot below we can see that the network has learn ‘something’ after 50 epochs:
学习过程图
出于预测目的,我准备了一组数据,其中包含最后 672 个步骤,其中包含所有值和 96 个没有 VAR 值——应该预测.我还使用了自回归,所以我在每次预测后更新了 VAR 并将其用于下一次预测.
For the prediction purpose I have prepared a set of data containing last 672 steps with all values and 96 without VAR value – which should be predicted. I also used autoregression, so I updated VAR after each prediction and used it for next prediction.
predX 数据集(用于预测)如下所示:
The predX dataset (used for prediction) looks like this:
print(predX['VAR'][668:677])
DateIdx VAR
2017-04-23 23:00:00 0.307573
2017-04-23 23:15:00 0.278207
2017-04-23 23:30:00 0.284390
2017-04-23 23:45:00 0.309118
2017-04-24 00:00:00 NaN
2017-04-24 00:15:00 NaN
2017-04-24 00:30:00 NaN
2017-04-24 00:45:00 NaN
2017-04-24 01:00:00 NaN
Name: VAR, dtype: float64
这是我用来预测接下来 96 个步骤的代码(自回归):
Here is the code (autoregression) I have used to predict next 96 steps:
stepsAhead = 96
historySteps = 672
for i in range(0,stepsAhead):
j = i + historySteps
ypred = model.predict(predX.values[i:j].reshape(1,historySteps,5))
predX['VAR'][j] = ypred
不幸的是,结果很差,与预期相差甚远:
Unfortunately the results are very poor and very far from the expectations:
带有预测数据的绘图
结合前一天的结果:
结合前一天的预测数据
除了我做错了什么"这个问题,我想问几个问题:
Except from the ‘What have I done wrong‘ question, I would like to ask a few questions:
Q1. 在模型查找期间,我刚刚将整个历史记录在 672 大小的批次中.这是正确的吗?我应该如何组织模型拟合的数据集?我有什么选择?我应该使用滑动窗口"方法吗(如此处的链接:https://machinelearningmastery.com/promise-recurrent-neural-networks-time-series-forecasting/)?
Q1. During model fifing, I have just put the whole history in batches of 672 size. Is it correct? How should I organize the dataset for the model fitting? What options do I have? Should I use the "sliding window" approach (like in the link here: https://machinelearningmastery.com/promise-recurrent-neural-networks-time-series-forecasting/ )?
Q2. 50 个 epoch 够吗?这里的常见做法是什么?也许网络拟合不足导致预测不佳?到目前为止,我尝试了 200 个 epoch,结果相同.
Q2. Is the 50 epochs enough? What is the common practice here? Maybe the network is underfitted resulting in poor prediction? So far I tried 200 epoch with the same result.
Q3.我应该尝试不同的架构吗?提议的网络足够大"来处理这样的数据吗?也许有状态"网络是正确的方法?
Q3. Should I try a different architecture? Is the proposed network ‘big enough’ to handle such a data? Maybe a "stateful" network is the right approach here?
Q4.我是否正确实施了自回归?是否有任何其他方法可以预测前面的许多步骤,例如192 还是 672 就像我的情况一样?
Q4. Did I implement the autoregression correctly? Is there any other approach to make a prediction for many steps ahead e.g. 192 or 672 like in my case?
解决方案
看起来对于如何组织数据来训练 RNN 存在混淆.那么让我们来讨论一下这些问题:
It looks like there is a confusion on how to organise the data to train a RNN. So let's cover the questions:
- 一旦你有一个 2D 数据集
(total_samples, 5)
,你就可以使用 TimeseriesGenerator 创建一个滑动窗口,它将为您生成(batch_size, past_timesteps, 5)
.在这种情况下,您将使用.fit_generator
来训练网络. - 如果你得到相同的结果,50 个 epoch 应该没问题.您通常会根据网络的性能进行调整.但是,如果您要比较两种不同的网络架构,则应保持不变.
- 架构非常庞大,因为您的目标是一次预测所有 672 个未来值.您可以设计网络,使其学会一次预测一个测量值.在预测时,您可以预测一个点并再次输入该点以预测下一个点,直到得到 672.
- 这与答案 3 相关,您可以学习一次预测一个,并将预测链接到训练后的
n
个预测数.
- Once you have a 2D dataset
(total_samples, 5)
you can use the TimeseriesGenerator to create a sliding window what will generate(batch_size, past_timesteps, 5)
for you. In this case, you will use.fit_generator
to train the network. - If you get the same result, 50 epochs should be fine. You usually adjust based on the performance of your network. But you should keep it fixed if you are comparing two different network architectures.
- Architecture is really large as you aim to predict all 672 future values at once. You can design the network so it learns to predict one measurement at a time. At prediction time you can predict one point and feed that again to predict the next until you get 672.
- This ties into answer 3, you can learn to predict one at a time and chain the predictions to
n
number of predictions after training.
单点预测模型可能如下所示:
The single point prediction model could look like:
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(past_timesteps, 5))
model.add(LSTM(64))
model.add(Dense(1))
相关文章