# Python uses GARCH, EGARCH, GJR-GARCH model and Monte Carlo simulation to predict stock prices

Time：2021-10-21

The prediction of stock price has been widely concerned by investors, governments, enterprises and scholars. However, the nonlinearity and non stationarity of data make the development of prediction model a complex and challenging task. In this article, I will explain how to  GARCH，EGARCHand  GJR-GARCH  Model andMonte-Carlo  Combined with simulation,   To establish an effective prediction model. The kurtosis, volatility and leverage characteristics of financial time series prove thatGARCH’srationality. The nonlinear characteristics of time series are used to examine Brownian motion and study time evolution patterns. Nonlinear prediction and signal analysis methods are becoming more and more popular in the stock market because of their robustness in feature extraction and classification.

The dynamic system can be described by a set of time-varying (continuous or discrete) variables, which constitute the basis of the nonlinear method of signal analysis. If the current value of time and state variables can accurately describe the system state at the next time, such a system can be said to be deterministic. On the other hand, if the current values of time and state variables only describe the probability that the value of state variables changes with time, the dynamic system is regarded as a random system.

Therefore, in useGARCH  Before modeling method  ， I will use  Fractal dimension (FD), reset  Range  and  Recursive quantitative analysis (RQA)Data modelingtechnology   To summarize the nonlinear dynamic behavior of the data and complete the research goal.

# method

Hurstcoefficient （H）  Is a long-term dependent characteristic parameter, and  FD (_fd + H = 2_)。 R / SAnalysis is the core tool of data modeling. Empirical research shows that,   Compared with other methods in the same category,   _ R / S_ It brings better results, such as autocorrelation and spectral decomposition analysis. It is a measure of the difference of time series, which is defined as a given duration_ （T）_ Mean range of   ， Divided by the standard deviation of the duration  [ R / S = k * T（H） ]；  _ķ_  Is a constant that depends on the time series. H measures the long-term memory of time series and represents it as mean reversion, trend or random walk.

H < 0.5 indicates mean recovery

H> 0.5 represents a trend sequence, and

H = 0.5 indicates random walk.

I’ll show you how to use it  GARCH Model for risk assessment.

GARCH  A key limitation of the model   Is to impose nonnegative constraints on its parameters to ensure the positivity of conditional variance. Such constraintsWill giveestimateGARCH  Model brings difficulties  。

Therefore, it is proposed that  Asymmetric GARCH  Model, commonly known as  GJR-GARCH  Model to solveSymmetric GARCH  Limitations of the model  。 More importantly, the index  GARCH  or  EGARCH ModelCompared with traditionalGARCH  The model has potential improvement.

# data mining

View data.   In the past few decades, crude oil prices have fluctuated greatly, especially around 2008. It can be seen that with the rise and fall of many times, the price remains at a relatively low level. The obvious autocorrelation in the original data can be seen from the autocorrelation diagram. The shapes of QQ and PP diagrams show that the process is close to normal, but_ Heavy tailed distribution.

The common form of simple yield is: R (T) = {P (T) – P (t-1)} / P (t-1), logarithmic yield = ln (Pt / P (t-1), Pt daily crude oil price, and R (T) is daily yield.

The logarithmic rate of return is regarded here as the daily rate of return of this paper. The visual display of original price and logarithmic rate of return clearly proves that it is reasonable to use logarithmic rate of return with almost constant mean value. The yield sequence diagram shows the high and low change cycles. In the figure, we can see a random and concentrated process near zero. Both positive and negative returns with large fluctuations increase the difficulty of venture capital and management. The average value of daily return is basically near the zero level, and has obvious volatility clustering, indicating the existence of heteroscedasticity. ACF is small but highly correlated. The shape of QQ and PP diagrams did not change significantly. ``````sns.distplot(df.returns,   color=’blue’)  # Density map

#   Summary statistics
print(df.returns.describe())``````

The skewness (- 0.119) and right deviation of the yield show that the yield is directly proportional to the negative yield, and the kurtosis (7.042) reflects the large fluctuation of oil price.

The skewness and kurtosis of the standard normal distribution are 0 and 3 respectively.

The value of Jarque BERA test shows that the traditional normal distribution assumption is not suitable for the real distribution of crude oil revenue.

``````ADF = ADF(df.returns)

kpss = KPSS(df.returns)
print(kpss.summary().as_text())`````` A VR test is conducted to test whether the log return series is a pure random walk and whether it has a certain predictability. Here I compare the log returns of 1 month and 12 months, and reject the null value of the series as pure random walk. The negative test statistic VA (- 11.07) rejects zero, indicating that there is sequence correlation in the time series. The tests of unit root and stability with ADF, kpss, dfgls, PP and Za statistics show significance, indicating that  GARCHType model to fit the income series is appropriate.

# nonlinear dynamics

Use_ Hurst_ Study on stationarity  。

``````#   Calculate the Hurst coefficient of the nearest price

tau = \[sqrt(std(subtract(closes\_recent\[lag:\], closes\_recent\[:-lag\]))) for lag in lags\]
m = polyfit(log(lags), log(tau), 1)
hurst = m\[0\]*2``````

_ H_ (0.531) represents a random motion time series with long-term dependence. Proved   In this studyGARCHRationality of the model  。 ``````computation = RQAComputation.create(settings,
verbose=True)
result = computation.run()
result.min\_diagonal\_line_length = 2``````

Here, low  R  Indicates lower periodic and random behavior. In addition, lower  DET  Values represent uncertainty. This proves the use ofGARCH  Rationality of method  。

# GARCH model

Before estimating GARCH type models, multiply the yield by 100. Since the volatility intercept is very close to other parameters in the model, it helps to optimize the program for conversion.

``X = 100* df.returns``

Let’s fit one  ARCH  Model and plot square residuals to check autocorrelation. ``````def getbest(TS):
best_aic = np.inf

for i in pq_rng:
for d in d_rng:
for j in pq_rng:
try:
tmp_mdl = smt.ARIMA(TS, order=(i,d,j)).fit(

#aic: 22462.01 | order: (2, 0, 2)``````
``````gam = arch_model(Model.resid, p=2, o=0, q=2, dist=’StudentsT’)
gres = gam.fit(update_freq=5, disp=’off’)
print(gres.summary())

tsplot(gres.resid**2, lags=30)``````

We can see that the square residual has the basis of autocorrelation. Let’s fit a GARCH model and see its performance. I will follow these steps:

• adoptARIMA（p，d，q）  The combination of models is iterated  To fit the optimal time series.
•   according to   With minimum AICARIMAModel selection  GARCHModel.
• take  GARCH（p，q）  The model is fitted to the time series.
• Check the model residuals and square residuals for autocorrelation

Therefore, we find here that the best model is  ARIMA（2,0,2）。 Now we plot the residuals to determine whether they have conditional heteroscedasticity.

`` arch_model(X, p=2, q=2, o=1,power=2.0, vol=’Garch’, dist=’StudentsT’)`` ``am = arch_model(X, p=2, q=2, o=1,power=2.0, vol=’Garch’, dist=’StudentsT’)``

All 3GARCH  Model output   Are displayed in tabular format. Ω   _ （ ω）_  Is white noise, alpha and beta are the parameters of the model. In addition,   _α  + β  <1_   Represents a stable model.  EGARCH  It seems to be the best of the three models.

It is best to split the data in training / testing and obtain MSE / Mae / RMSE results to compare the best model fit. The normalized residual is calculated by dividing the residual by the conditional volatility.

``````std\_resid =  resid /  conditional\_volatility
unit\_var\_resid =  resid /  resid.std()``````

The normalized residuals and conditional fluctuation diagrams show some errors, but the amplitude is small. ``````sns.kdeplot(squared_resid, shade=True)

Standardized residuals and non standardized residuals are also plotted. The square of the residual is more peaked in the center, indicating that the tail of the distribution is heavier than that of the standard residual. Let’s check the ACF diagram. ``plot\_acf(std\_resid)``

It seems that some peaks are beyond the confidence zone of the shadow. Let’s look at the square of the residuals. The residual square shows that the data points are within the confidence area (95%) of the blue shadow, indicating that the model fitting is good. ``````res = am.fit()
fig = res.hedgehog_plot(type=’mean’)``````

The figure shows the forecast for the whole 2019. The orange line indicates the prediction in different time intervals.

# Simulation based prediction

Here, a simulation based approach is used fromEGARCH  The confidence interval of the predicted volatility is obtained in the simulation  。 Want fromEGARCH  The model obtains fluctuation prediction   The model is simulated from the last observation of the fitted model. This process is repeated many times to obtain a volatility forecast. The prediction point is calculated by averaging the simulation, and the quantiles of 2.5% and 97.5% of the simulation distribution are used to calculate the 95% confidence interval respectively. Considering that the average rate of return (MU) is 0.0292 and the annual volatility (vol) is (26.48) * sqrt 252 = 37.37%. ``````#Define variables
T  =  two hundred and fifty-two  # Transaction days
mu  =  zero point zero six two two  # profit
vol  =  zero point three seven three seven  # Volatility

daily_returns=np.random.normal((1+mu)**(1/T),vol/sqrt(T),T)

#Generate graph - histogram of price series and daily revenue
plt.plot(price_list)
plt.hist(daily_returns-1, 100)``````

The top graph shows the simulation of the evolution of potential price series in a trading year (252 days) according to the random daily return following the normal distribution. The second figure is a histogram of these random daily returns in a year. However, insight can be gained by running thousands of simulations, each of which produces a series of different potential price evolution based on the same characteristics (price trading volume).

``````#Set up an empty list to hold the final value of each of our simulated price series
result = \[\]

S  =  df.Price\[-1\]  # Starting stock price (i.e. the last available actual stock price)
T  =  two hundred and fifty-two  # Transaction days
mu  =  zero point zero six two two  # Yield
vol  =  zero point three seven three seven  # Volatility

#Select the number of runs to simulate - I selected 10000
for i in range(10000):
#Create daily income statement using random normal distribution
daily_returns= np.random.normal((1+mu)**(1/T),vol/sqrt(T),T)

#Set the starting price and create the price sequence generated by the above random daily return

#Add the end value of each simulation run to the empty list we created at the beginning
result.append(price_list\[-1\])``````

Since these are stochastic simulations of daily returns, the results here will be slightly different. Due to the paths included in each simulation, the average tends to be the average gain used by “Mu”. The histogram below shows the two quantiles of the price distribution to understand the possibility of high or low yields. Obviously, there is a 5% chance that the crude oil price will eventually fall below 29.72 yuan and a 5% chance that it will be higher than 101.75 US dollars.

# generalization

Under the background of high fluctuation of crude oil price, I studied and proposed mixed time-varying length memory  GARCH  And simulation based prediction model, which takes into account fluctuation facts such as asymmetry and heteroscedasticity, time-varying risk, long memory and heavy tail distribution. Empirical evidence shows that crude oil data with Brownian motion often show a certain degree of predictability in its time dynamics. The study considered data from 2000 to 2019, when the stock market experienced several financial crises and post crisis stages. The model trained with the data of this period is expected to have excellent prediction ability.

When processing time series data of long-term fluctuating crude oil prices,GARCH (2,2) the model estimates the persistence of variance  。   Carried outMonte Carlo Analysis to check the robustness of the results.Monte Carlo   Analog output   It shows that the results are still reliable even after controlling irrelevant factors. Therefore, these findings provide excellent mixing  EGARCH  and  Monte Carlo   The simulated prediction model, which considers the volatility characteristics, such as volatility clustering and asymmetry, time-varying risk and heavy tail distribution, to measure the crude oil price. Most popular insights

## SQL statement of three-level linkage of provinces, cities and counties

The first is the table creation statement Copy codeThe code is as follows: CREATE TABLE `t_address_province` ( `id` INT AUTO_ Increment primary key comment ‘primary key’,`Code ` char (6) not null comment ‘province code’,`Name ` varchar (40) not null comment ‘province name’）Engine = InnoDB default charset = utf8 comment = ‘province information table’; CREATE TABLE […]