Notes before writing:
A few days ago, I wore a thin coat for a few days, but I was defeated by the weather in Guangzhou (too hot). So it’s still short sleeve shorts. It’s not cold in Guangzhou.
Tensorflow 2.3.1 was used in this experiment.
import numpy as np import matplotlib.pyplot as plt from pandas import read_csv import math from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_squared_error %matplotlib inline
Using the historical stock price data of Yahoo! Finance ^ GSPC in the past five years, from November 2015 to November 2020, a total of 1256. This data contains daily stock price information, such as date, open, high, low, close, adj close, volume.
Tips: stock knowledge
- Date: Date
- Open: opening price (the starting price of a stock on a certain day)
- High: the highest price
- Low: the lowest price
- Close: closing price (the final price of the stock on a certain day)
- Adj close: weighted closing price
- Volume: total transaction volume
For simplicity, only the closing price is used for forecasting. The chart below shows the closing price of the past five years.
#Loading dataset with pandas dataframe = read_csv('data/stock_data.csv', usecols=, engine='python', skipfooter=3) data = dataframe.values #Change integer to float data = data.astype('float32') plt.plot(data) plt.show()
Forecast the future stock closing price, this forecast is the last 56 data.
Build training set and test set
The closing price of the past five years is a length ofNThe time series of P0, p1,…,pN-1For the price of each day. Before useiData forecast Noi+1 data construction training set and test set, 0<i < N, i.e
X0 = (p0, p1,…, pi-1)
X1 = (pi, pi+1,…, p2i-1)
Xt = (pti, pti+1,…, p(t+1)i-1)
Xt+1 = (p(t+1)i, p(t+1)i+1,…, p(t+2)i-1)
Choose herei= 6。 In LSTM, time_ If steps = 6, the training set can be expressed as
Input1 = [p0, p1, p2, p3, p4, p5], Label1 = [p6]
Input2 = [p1, p2, p3, p4, p5, p6], Label1 = [p7]
Input3 = [p2, p3, p4, p5, p6, p7], Label1 = [p8]
#Construct matrix from original data set def create_dataset(data, time_steps): dataX, dataY = ,  for i in range(len(data) - time_steps): a = data[i:(i + time_steps), 0] dataX.append(a) dataY.append(data[i + time_steps, 0]) return np.array(dataX), np.array(dataY)
Set 95.55% as training set and the rest as test set
#Normalization scaler = MinMaxScaler(feature_range=(0, 1)) data = scaler.fit_transform(data) #Cut into training set and test set train_size = int(len(data) * 0.9555) test_size = len(data) - train_size train, test = data[0:train_size,:], data[train_size:len(data),:] time_steps = 6 trainX, trainY = create_dataset(train, time_steps) testX, testY = create_dataset(test, time_steps) #The format of reshape input model data is: [samples, time steps, features] trainX = np.reshape(trainX, (trainX.shape, trainX.shape, 1)) testX = np.reshape(testX, (testX.shape, testX.shape, 1))
Establish and train LSTM model
The number of neurons in the hidden layer is 128, the output layer is 1 predictive value, and the number of iterations is 100.
Tips: Calculation of LSTM parameters
(hidden size × (hidden size + x_dim) + hidden size) × 4
X_ Dim is the characteristic dimension of input data, here is 1.
model = Sequential() model.add(LSTM(128, input_shape=(time_steps, 1))) model.add(Dense(1)) model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy']) model.summary() history = model.fit(trainX, trainY, epochs=100, batch_size=64, verbose=1) score = model.evaluate(testX, testY, batch_size=64, verbose=1)
The result of loss function of visual training set is shown in the figure below. It can be seen that the value of loss converges gradually.
def visualize_loss(history, title): loss = history.history["loss"] epochs = range(len(loss)) plt.figure() plt.plot(epochs, loss, "b", label="Training loss") plt.title(title) plt.xlabel("Epochs") plt.ylabel("Loss") plt.legend() plt.show() visualize_loss(history, "Training Loss")
#Prediction training set and test set trainPredict = model.predict(trainX) testPredict = model.predict(testX) #The prediction results are inverse normalized trainPredict = scaler.inverse_transform(trainPredict) trainY = scaler.inverse_transform([trainY]) testPredict = scaler.inverse_transform(testPredict) testY = scaler.inverse_transform([testY]) #Calculating RMSE of training set and test set trainScore = math.sqrt(mean_squared_error(trainY, trainPredict[:,0])) print('Train Score: %.2f RMSE' % (trainScore)) testScore = math.sqrt(mean_squared_error(testY, testPredict[:,0])) print('Test Score: %.2f RMSE' % (testScore)) #Draw the prediction result graph trainPredictPlot = np.empty_like(data) trainPredictPlot[:, :] = np.nan trainPredictPlot[time_steps:len(trainPredict) + time_steps, :] = trainPredict testPredictPlot = np.empty_like(data) testPredictPlot[:, :] = np.nan testPredictPlot[len(trainPredict) + (time_steps * 2)-1:len(data) - 1, :] = testPredict plt.plot(scaler.inverse_transform(data)) plt.plot(trainPredictPlot) plt.plot(testPredictPlot) plt.show()
In the above figure, the blue line is the original data, the orange line and the green line are the prediction results of the training set and the test set, respectively.