Forecasting stock price with LSTM


Notes before writing:

A few days ago, I wore a thin coat for a few days, but I was defeated by the weather in Guangzhou (too hot). So it’s still short sleeve shorts. It’s not cold in Guangzhou.

Environmental preparation

Tensorflow 2.3.1 was used in this experiment.

import numpy as np
import matplotlib.pyplot as plt
from pandas import read_csv
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
%matplotlib inline

data description

Using the historical stock price data of Yahoo! Finance ^ GSPC in the past five years, from November 2015 to November 2020, a total of 1256. This data contains daily stock price information, such as date, open, high, low, close, adj close, volume.

Tips: stock knowledge

  • Date: Date
  • Open: opening price (the starting price of a stock on a certain day)
  • High: the highest price
  • Low: the lowest price
  • Close: closing price (the final price of the stock on a certain day)
  • Adj close: weighted closing price
  • Volume: total transaction volume

For simplicity, only the closing price is used for forecasting. The chart below shows the closing price of the past five years.
Forecasting stock price with LSTM

#Loading dataset with pandas
dataframe = read_csv('data/stock_data.csv', usecols=[4], engine='python', skipfooter=3)
data = dataframe.values

#Change integer to float
data = data.astype('float32')


Forecast the future stock closing price, this forecast is the last 56 data.

Build training set and test set

The closing price of the past five years is a length ofNThe time series of P0, p1,…,pN-1For the price of each day. Before useiData forecast Noi+1 data construction training set and test set, 0<i < N, i.e

X0 = (p0, p1,…, pi-1)
X1 = (pi, pi+1,…, p2i-1)

Xt = (pti, pti+1,…, p(t+1)i-1)

To predict

Xt+1 = (p(t+1)i, p(t+1)i+1,…, p(t+2)i-1)

Choose herei= 6。 In LSTM, time_ If steps = 6, the training set can be expressed as

Input1 = [p0, p1, p2, p3, p4, p5], Label1 = [p6]
Input2 = [p1, p2, p3, p4, p5, p6], Label1 = [p7]
Input3 = [p2, p3, p4, p5, p6, p7], Label1 = [p8]


#Construct matrix from original data set
def create_dataset(data, time_steps):
    dataX, dataY = [], []
    for i in range(len(data) - time_steps):
        a = data[i:(i + time_steps), 0]
        dataY.append(data[i + time_steps, 0])
    return np.array(dataX), np.array(dataY)

Set 95.55% as training set and the rest as test set

scaler = MinMaxScaler(feature_range=(0, 1))
data = scaler.fit_transform(data)

#Cut into training set and test set
train_size = int(len(data) * 0.9555)
test_size = len(data) - train_size
train, test = data[0:train_size,:], data[train_size:len(data),:]
time_steps = 6
trainX, trainY = create_dataset(train, time_steps)
testX, testY = create_dataset(test, time_steps)

#The format of reshape input model data is: [samples, time steps, features]
trainX = np.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))
testX = np.reshape(testX, (testX.shape[0], testX.shape[1], 1))

Establish and train LSTM model

The number of neurons in the hidden layer is 128, the output layer is 1 predictive value, and the number of iterations is 100.

Tips: Calculation of LSTM parameters

(hidden size × (hidden size + x_dim) + hidden size) × 4
X_ Dim is the characteristic dimension of input data, here is 1.


model = Sequential()
model.add(LSTM(128, input_shape=(time_steps, 1)))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history =, trainY, epochs=100, batch_size=64, verbose=1)
score = model.evaluate(testX, testY, batch_size=64, verbose=1)

Forecasting stock price with LSTM
The result of loss function of visual training set is shown in the figure below. It can be seen that the value of loss converges gradually.
Forecasting stock price with LSTM

def visualize_loss(history, title):
    loss = history.history["loss"]
    epochs = range(len(loss))
    plt.plot(epochs, loss, "b", label="Training loss")

visualize_loss(history, "Training Loss")

Forecast results


#Prediction training set and test set
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)

#The prediction results are inverse normalized
trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform([trainY])
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform([testY])

#Calculating RMSE of training set and test set
trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))
print('Test Score: %.2f RMSE' % (testScore))

#Draw the prediction result graph
trainPredictPlot = np.empty_like(data)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[time_steps:len(trainPredict) + time_steps, :] = trainPredict

testPredictPlot = np.empty_like(data)
testPredictPlot[:, :] = np.nan
testPredictPlot[len(trainPredict) + (time_steps * 2)-1:len(data) - 1, :] = testPredict


Forecasting stock price with LSTM

In the above figure, the blue line is the original data, the orange line and the green line are the prediction results of the training set and the test set, respectively.

reference resources……