Link to the original text:http://tecdat.cn/?p=21757
Time series models can be divided into deterministic models and stochastic models according to whether the research objects are random or not.
Stochastic time series model is a model that only uses its past value and random disturbance term. To establish a specific model, we need to solve the following three problems: the specific form of the model, the lag of time series variables and the structure of random disturbance term.
μ It is the mean value of YT; ψ Is the coefficient, which determines the linear dynamic structure of the time series, also known as the weight ψ 0=1；{ ε t} It is a Gaussian white noise sequence, which means that new information appears in the time series {YT} at time t, so it can be used to analyze the time series {YT} ε T is called innovation (new information) or shock (disturbance) of time t.
Unit root test is a special method of stationarity test. Unit root test is the basis of ARMA model, ARIMA model, cointegration analysis and causality test.
For unit root tests, to illustrate the implementation of these tests, consider the following series
> plot(X,type="l")
 Dickey Fuller (standard)
Here, for a simple version of the Dickey fuller test, we assume that
We want to test whether (or not). We can write the previous expression as
So we only need to test whether the regression coefficient in linear regression is empty. This can be done by Student ttest. If we consider that the previous model has no linear drift, we must consider the following regression
Call:
lm(formula = z.diff ~ 0 + z.lag.1)
Residuals:
Min 1Q Median 3Q Max
2.84466 0.55723 0.00494 0.63816 2.54352
Coefficients:
Estimate Std. Error t value Pr(>t)
z.lag.1 0.005609 0.007319 0.766 0.444
Residual standard error: 0.963 on 238 degrees of freedom
Multiple Rsquared: 0.002461, Adjusted Rsquared: 0.00173
Fstatistic: 0.5873 on 1 and 238 DF, pvalue: 0.4442
Our test program will be based on the value of student’s ttest,
> summary(lm(z.diff~0+z.lag.1 ))$coefficients[1,3]
[1] 0.7663308
This is the value used in the calculation
ur.df(X,type="none",lags=0)
###############################################################
# Augmented DickeyFuller Test Unit Root / Cointegration Test #
###############################################################
The value of the test statistic is: 0.7663
Critical values (99%, 95%, 90%) can be used to explain this value
> qnorm(c(.01,.05,.1)/2)
[1] 2.575829 1.959964 1.644854
If the statistics exceed these values, then the sequence is not stationary, because we can’t reject such an assumption. So we can conclude that there is a unit root. In fact, these thresholds are determined by
###############################################
# Augmented DickeyFuller Test Unit Root Test #
###############################################
Test regression none
Call:
lm(formula = z.diff ~ z.lag.1  1)
Residuals:
Min 1Q Median 3Q Max
2.84466 0.55723 0.00494 0.63816 2.54352
Coefficients:
Estimate Std. Error t value Pr(>t)
z.lag.1 0.005609 0.007319 0.766 0.444
Residual standard error: 0.963 on 238 degrees of freedom
Multiple Rsquared: 0.002461, Adjusted Rsquared: 0.00173
Fstatistic: 0.5873 on 1 and 238 DF, pvalue: 0.4442
Value of teststatistic is: 0.7663
Critical values for test statistics:
1pct 5pct 10pct
tau1 2.58 1.95 1.62
R has several packages that can be used for unit root testing.
Augmented DickeyFuller Test
data: X
DickeyFuller = 2.0433, Lag order = 0, pvalue = 0.5576
alternative hypothesis: stationary
There is also a test for the null hypothesis that there is a unit root. But the value of P is completely different.
p.value
[1] 0.4423705
testreg$coefficients[4]
[1] 0.4442389
 AugmentationDickey fuller test
There may be some lag in regression. For example, we can consider
Again, we need to check if a coefficient is zero. This can be done with the student ttest.
> summary(lm(z.diff~0+z.lag.1+z.diff.lag ))
Call:
lm(formula = z.diff ~ 0 + z.lag.1 + z.diff.lag)
Residuals:
Min 1Q Median 3Q Max
2.87492 0.53977 0.00688 0.64481 2.47556
Coefficients:
Estimate Std. Error t value Pr(>t)
z.lag.1 0.005394 0.007361 0.733 0.464
z.diff.lag 0.028972 0.065113 0.445 0.657
Residual standard error: 0.9666 on 236 degrees of freedom
Multiple Rsquared: 0.003292, Adjusted Rsquared: 0.005155
Fstatistic: 0.3898 on 2 and 236 DF, pvalue: 0.6777
coefficients[1,3]
[1] 0.7328138
This value is used to
> df=ur.df(X,type="none",lags=1)
###############################################
# Augmented DickeyFuller Test Unit Root Test #
###############################################
Test regression none
Call:
lm(formula = z.diff ~ z.lag.1  1 + z.diff.lag)
Residuals:
Min 1Q Median 3Q Max
2.87492 0.53977 0.00688 0.64481 2.47556
Coefficients:
Estimate Std. Error t value Pr(>t)
z.lag.1 0.005394 0.007361 0.733 0.464
z.diff.lag 0.028972 0.065113 0.445 0.657
Residual standard error: 0.9666 on 236 degrees of freedom
Multiple Rsquared: 0.003292, Adjusted Rsquared: 0.005155
Fstatistic: 0.3898 on 2 and 236 DF, pvalue: 0.6777
Value of teststatistic is: 0.7328
Critical values for test statistics:
1pct 5pct 10pct
tau1 2.58 1.95 1.62
Similarly, you can use other packages:
Augmented DickeyFuller Test
data: X
DickeyFuller = 1.9828, Lag order = 1, pvalue = 0.5831
alternative hypothesis: stationary
The conclusion is the same (we should reject the assumption that the sequence is stationary).
 With trend and driftAugmentationDickey fuller test
So far, drift has not been included in our model. But it’s very simple (this will be called an extended version of the previous process): we just need to include a constant in the regression,
> summary(lm)
Residuals:
Min 1Q Median 3Q Max
2.91930 0.56731 0.00548 0.62932 2.45178
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 0.29175 0.13153 2.218 0.0275 *
z.lag.1 0.03559 0.01545 2.304 0.0221 *
z.diff.lag 0.01976 0.06471 0.305 0.7603

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9586 on 235 degrees of freedom
Multiple Rsquared: 0.02313, Adjusted Rsquared: 0.01482
Fstatistic: 2.782 on 2 and 235 DF, pvalue: 0.06393
Considering some analysis of variance output, we obtain the statistics of interest here, in which the model is compared with the model without integration part, and drift,
> summary(lmcoefficients[2,3]
[1] 2.303948
> anova(lm$F[2]
[1] 2.732912
These two values are also passed through
ur.df(X,type="drift",lags=1)
###############################################
# Augmented DickeyFuller Test Unit Root Test #
###############################################
Test regression drift
Residuals:
Min 1Q Median 3Q Max
2.91930 0.56731 0.00548 0.62932 2.45178
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 0.29175 0.13153 2.218 0.0275 *
z.lag.1 0.03559 0.01545 2.304 0.0221 *
z.diff.lag 0.01976 0.06471 0.305 0.7603

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9586 on 235 degrees of freedom
Multiple Rsquared: 0.02313, Adjusted Rsquared: 0.01482
Fstatistic: 2.782 on 2 and 235 DF, pvalue: 0.06393
Value of teststatistic is: 2.3039 2.7329
Critical values for test statistics:
1pct 5pct 10pct
tau2 3.46 2.88 2.57
phi1 6.52 4.63 3.81
We can also include a linear trend,
> temps=(lags+1):n
lm(z.diff~1+temps+z.lag.1+z.diff.lag )
Residuals:
Min 1Q Median 3Q Max
2.87727 0.58802 0.00175 0.60359 2.47789
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 0.3227245 0.1502083 2.149 0.0327 *
temps 0.0004194 0.0009767 0.429 0.6680
z.lag.1 0.0329780 0.0166319 1.983 0.0486 *
z.diff.lag 0.0230547 0.0652767 0.353 0.7243

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9603 on 234 degrees of freedom
Multiple Rsquared: 0.0239, Adjusted Rsquared: 0.01139
Fstatistic: 1.91 on 3 and 234 DF, pvalue: 0.1287
> summary(lmcoefficients[3,3]
[1] 1.98282
> anova(lm$F[2]
[1] 2.737086
The R function returns
ur.df(X,type="trend",lags=1)
###############################################
# Augmented DickeyFuller Test Unit Root Test #
###############################################
Test regression trend
Residuals:
Min 1Q Median 3Q Max
2.87727 0.58802 0.00175 0.60359 2.47789
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 0.3227245 0.1502083 2.149 0.0327 *
z.lag.1 0.0329780 0.0166319 1.983 0.0486 *
tt 0.0004194 0.0009767 0.429 0.6680
z.diff.lag 0.0230547 0.0652767 0.353 0.7243

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9603 on 234 degrees of freedom
Multiple Rsquared: 0.0239, Adjusted Rsquared: 0.01139
Fstatistic: 1.91 on 3 and 234 DF, pvalue: 0.1287
Value of teststatistic is: 1.9828 1.8771 2.7371
Critical values for test statistics:
1pct 5pct 10pct
tau3 3.99 3.43 3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
 Kpss inspection
Here, in the kpss process, two models can be considered: drift model or linear trend model. Here, the zero hypothesis is that the sequence is stationary.
The code is
ur.kpss(X,type="mu")
#######################
# KPSS Unit Root Test #
#######################
Test is of type: mu with 4 lags.
Value of teststatistic is: 0.972
Critical value for a significance level of:
10pct 5pct 2.5pct 1pct
critical values 0.347 0.463 0.574 0.73
In this case, there is a trend
ur.kpss(X,type="tau")
#######################
# KPSS Unit Root Test #
#######################
Test is of type: tau with 4 lags.
Value of teststatistic is: 0.5057
Critical value for a significance level of:
10pct 5pct 2.5pct 1pct
critical values 0.119 0.146 0.176 0.216
Again, you can use another package to get the same validation (but also, different output)
KPSS Test for Level Stationarity
data: X
KPSS Level = 1.1997, Truncation lag parameter = 3, pvalue = 0.01
> kpss.test(X,"Trend")
KPSS Test for Trend Stationarity
data: X
KPSS Trend = 0.6234, Truncation lag parameter = 3, pvalue = 0.01
At least there’s consistency, because we’ve been rejecting assumptions.
 Philipps Perron test
Philipps Perron test is based on ADF process. code
> PP.test(X)
PhillipsPerron Unit Root Test
data: X
DickeyFuller = 2.0116, Truncation lag parameter = 4, pvalue = 0.571
Another possible alternative is
> pp.test(X)
PhillipsPerron Unit Root Test
data: X
DickeyFuller Z(alpha) = 7.7345, Truncation lag parameter = 4, pvalue
= 0.6757
alternative hypothesis: stationary
 compare
I won’t spend more time comparing different code, running these tests in R. Let’s take a moment to quickly compare the three methods. Let’s generate some autoregressive processes with more or less autocorrelation, and some random walks. Let’s see how these tests are performed
> for(i in 1:(length(AR)+1)
+ for(s in 1:1000){
+ if(i!=1) X=arima.sim
+ M2[s,i]=(pp.testp.value)
+ M1[s,i]=(kpss.testp.value)
+ M3[s,i]=(adf.testp.value)
+ }
Here, we need to calculate the number of times the p value of the test exceeds 5%,
> plot(AR,P[1,],type="l",col="red",ylim=c(0,1)
> lines(AR,P[2,],type="l",col="blue")
> lines(AR,P[3,],type="l",col="green")
We can see here how unstable the Dickey fuller test is because 50% (at least) of our autoregressive process is considered to be nonstationary.

 –
Most popular insights
1.Matlab Markov chain Monte Carlo method (MCMC) to estimate stochastic volatility (SV) model
2.Threshold selection method of adaptive kernel density estimation in disease mapping based on R language
3.WinBUGS for multivariate stochastic volatility model: Bayesian estimation and model comparison
4.Hosmer lemeshow goodness of fit test in R language regression
5.Markov switching armagarch model estimation of MCMC based on MATLAB
6.Regression analysis of interval data in R language
7.R language Wald test vs likelihood ratio test
8.Predicting stock price with linear regression in Python
9.How does R language calculate IDI and NRI in survival analysis and Cox regression