用于多元时间序列的 Keras 递归神经网络
问题描述
我一直在阅读有关 Keras RNN 模型(LSTM 和 GRU)的文章,作者似乎主要关注语言数据或使用由先前时间步长组成的训练实例的单变量时间序列.我的数据有点不同.
I have been reading about Keras RNN models (LSTMs and GRUs), and authors seem to largely focus on language data or univariate time series that use training instances composed of previous time steps. The data I have is a bit different.
我有 20 个变量在 10 年内每年测量 100,000 人作为输入数据,并将第 11 年测量的 20 个变量作为输出数据.我想做的是预测第 11 年的其中一个变量(而不是其他 19 个)的值.
I have 20 variables measured every year for 10 years for 100,000 persons as input data, and the 20 variables measured for year 11 as output data. What I would like to do is predict the value of one of the variables (not the other 19) for the 11th year.
我的数据结构为 X.shape = [persons, years, variables] = [100000, 10, 20]
和 Y.shape = [persons, variable] = [100000, 1]
.下面是我的 LSTM 模型的 Python 代码.
I have my data structured as X.shape = [persons, years, variables] = [100000, 10, 20]
and Y.shape = [persons, variable] = [100000, 1]
. Below is my Python code for a LSTM model.
## LSTM model.
# Define model.
network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(128, activation = 'tanh',
input_shape = (X.shape[1], X.shape[2])))
network_lstm.add(layers.Dense(1, activation = None))
# Compile model.
network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_lstm = network_lstm.fit(X, Y, epochs = 25, batch_size = 128)
我有四个(相关)问题,请:
I have four (related) questions, please:
我是否为我拥有的数据结构正确编码了 Keras 模型?我从全连接网络(使用扁平数据)和 LSTM、GRU 和 1D CNN 模型获得的性能几乎相同,我不知道我是否在 Keras 中犯了错误,或者循环模型是否只是简单的在这种情况下没有帮助.
Have I coded the Keras model correctly for the data structure I have? The performance I get from a fully-connected network (using flattened data) and from LSTM, GRU, and 1D CNN models are nearly identical, and I don't know if I have made an error in Keras or if a recurrent model is simply not helpful in this case.
我是否应该将 Y 作为一个形状为 Y.shape = [persons, years] = [100000, 11]
的系列,而不是将变量包含在 X 中,这样就会有形状 X.shape = [人、年、变量] = [100000, 10, 19]
?如果是这样,我怎样才能让 RNN 输出预测的序列?当我使用 return_sequences = True
时,Keras 返回错误.
Should I have Y as a series with shape Y.shape = [persons, years] = [100000, 11]
, rather than including the variable in X, which would then have shape X.shape = [persons, years, variables] = [100000, 10, 19]
? If so, how can I get the RNN to output the predicted sequence? When I use return_sequences = True
, Keras returns an error.
这是用我拥有的数据进行预测的最佳方法吗?Keras RNN 模型甚至其他模型中是否有更好的选项可供选择?
Is this the best way to predict with the data I have? Are there better option choices available in the Keras RNN models, or even other models?
我如何模拟类似于现有数据结构的数据,以使 RNN 模型的性能优于全连接网络?
How could I simulate data resembling the data structure I have so that a RNN model would outperform a fully-connected network?
更新:
我尝试了一个模拟,我希望这是一个非常简单的案例,RNN 应该有望胜过 FNN.
I have tried a simulation, with what I hope is a very simple case where an RNN should be expected to outperform a FNN.
虽然当 LSTM 的隐藏层较少 (4) 时,LSTM 的性能往往优于 FNN,但在隐藏层较多 (8+) 的情况下,性能变得相同.谁能想到一个更好的模拟,其中 RNN 有望胜过具有相似数据结构的 FNN?
While the LSTM tends to outperform the FNN when both have less hidden layers (4), the performance becomes identical with more hidden layers (8+). Can anyone think of a better simulation where a RNN would be expected to outperform a FNN with a similar data structure?
from keras import models
from keras import layers
from keras.layers import Dense, LSTM
import numpy as np
import matplotlib.pyplot as plt
下面的代码模拟了 10,000 个实例、10 个时间步长和 2 个变量的数据.如果第二个变量在第一个时间步为 0,则 Y 是最后一个时间步的第一个变量的值乘以 3.如果第二个变量在第一个时间步为 1,则 Y 为最后一个时间步的第一个变量的值乘以 9.
The code below simulates data for 10,000 instances, 10 time steps, and 2 variables. If the second variable has a 0 in the very first time step, then Y is the value of the first variable for the very last time step multiplied by 3. If the second variable has a 1 in the very first time step, then Y is the value of the first variable for the very last time step multiplied by 9.
我希望 RNN 将第一个时间步的第二个变量的值保留在内存中,并使用它来知道哪个值(3 或 9)与最后一个时间步的第一个变量相乘.
My hope was that the RNN would keep the value of second variable at the very first time step in memory and use that to know which value (3 or 9) to multiply the the first variable for the very last time step.
## Simulate data.
instances = 10000
sequences = 10
X = np.zeros((instances, sequences * 2))
X[:int(instances / 2), 1] = 1
for i in range(instances):
for j in range(0, sequences * 2, 2):
X[i, j] = np.random.random()
Y = np.zeros((instances, 1))
for i in range(len(Y)):
if X[i, 1] == 0:
Y[i] = X[i, -2] * 3
if X[i, 1] == 1:
Y[i] = X[i, -2] * 9
下面是 FNN 的代码:
Below is code for a FNN:
## Densely connected model.
# Define model.
network_dense = models.Sequential()
network_dense.add(layers.Dense(4, activation = 'relu',
input_shape = (X.shape[1],)))
network_dense.add(Dense(1, activation = None))
# Compile model.
network_dense.compile(optimizer = 'rmsprop', loss = 'mean_absolute_error')
# Fit model.
history_dense = network_dense.fit(X, Y, epochs = 100, batch_size = 256, verbose = False)
plt.scatter(Y[X[:, 1] == 0, :], network_dense.predict(X[X[:, 1] == 0, :]), alpha = 0.1)
plt.plot([0, 3], [0, 3], color = 'black', linewidth = 2)
plt.title('FNN, Second Variable has a 0 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
plt.scatter(Y[X[:, 1] == 1, :], network_dense.predict(X[X[:, 1] == 1, :]), alpha = 0.1)
plt.plot([0, 9], [0, 9], color = 'black', linewidth = 2)
plt.title('FNN, Second Variable has a 1 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
下面是 LSTM 的代码:
Below is code for a LSTM:
## Structure X data for LSTM.
X_lstm = X.reshape(X.shape[0], X.shape[1] // 2, 2)
X_lstm.shape
## LSTM model.
# Define model.
network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(4, activation = 'relu',
input_shape = (X_lstm.shape[1], 2)))
network_lstm.add(layers.Dense(1, activation = None))
# Compile model.
network_lstm.compile(optimizer = 'rmsprop', loss = 'mean_squared_error')
# Fit model.
history_lstm = network_lstm.fit(X_lstm, Y, epochs = 100, batch_size = 256, verbose = False)
plt.scatter(Y[X[:, 1] == 0, :], network_lstm.predict(X_lstm[X[:, 1] == 0, :]), alpha = 0.1)
plt.plot([0, 3], [0, 3], color = 'black', linewidth = 2)
plt.title('LSTM, FNN, Second Variable has a 0 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
plt.scatter(Y[X[:, 1] == 1, :], network_lstm.predict(X_lstm[X[:, 1] == 1, :]), alpha = 0.1)
plt.plot([0, 9], [0, 9], color = 'black', linewidth = 2)
plt.title('LSTM, FNN, Second Variable has a 1 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
解决方案
是的,所使用的代码对于您要执行的操作是正确的.10 年是用于预测下一年的时间窗口,因此这应该是 20 个变量中每个变量的模型输入数量.100,000 个观测值的样本量与模型的输入形状无关.
Yes the code used is correct for what you are trying to do. 10 years is the time window used to predict the following year so that should be the number of inputs into your model for each of the 20 variables. The sample size of 100,000 observations is not relevant to the input shape of your model.
你最初塑造因变量 Y 的方式是正确的.您预测 1 个变量的窗口期为 1 年,并且您有 100,000 个观察值.关键字参数 return_sequences=True
将导致抛出错误,因为您只有一个 LSTM 层.如果您要实现多个 LSTM 层并且相关层后面跟着另一个 LSTM 层,请将此参数设置为 True
.
The way that you had originally shaped the dependent variable Y is correct. You are predicting a window of 1 year for 1 variable and you have 100,000 observations. The key word argument return_sequences=True
will cause an error to be thrown because you only have a single LSTM layer. Set this parameter to True
if you are implementing multiple LSTM layers and the layer in question is followed by another LSTM layer.
我希望我可以为 3 提供一些指导,但实际上没有您的数据集,我不知道是否可以肯定地回答这个问题.
I wish I could offer some guidance to 3 but without actually having your dataset I don't know if it's possible to answer this with any sort of certainty.
我会说 LSTM 旨在解决常规 RNN 中存在的所谓的长期依赖问题.这个问题归结为,随着相关信息被观察到该信息有用的时间点之间的差距越来越大,标准 RNN 将更难学习它们之间的关系.考虑根据 3 天的活动与全年的活动来预测股票价格.
I will say that LSTM's were designed to address what is know as the the long term dependency problem present in regular RNN's. What this problem boils down to is that as the gap between when the relevant information was observed to the point where that information would be useful grows, the standard RNN will have a harder time learning the relationship between them. Think of predicting a stock price based on 3 days of activity vs an entire year.
这导致数字 4.如果我松散地使用相似"一词并将您的时间窗口进一步延长到 50 年而不是 10 年,那么使用 LSTM 获得的优势会变得更加明显.虽然我确信更有经验的人能够提供更好的答案,我很期待看到它.
This leads into number 4. If I use the term 'resembling' loosely and stretch your time window further out to say 50 years as opposed to 10, the advantages gained from using an LSTM would become more apparent. Although I'm sure that someone more experienced will be able to offer a better answer and I look forward to seeing it.
我发现此页面有助于理解 LSTM:
I found this page helpful for understanding LSTM's:
https://colah.github.io/posts/2015-08-理解-LSTMs/
相关文章