Several unit root tests (ADF, kpss, PP) and comparative analysis of R language time series stationarity

Time:2021-5-12

Link to the original text:http://tecdat.cn/?p=21757

Time series models can be divided into deterministic models and stochastic models according to whether the research objects are random or not.

Stochastic time series model is a model that only uses its past value and random disturbance term. To establish a specific model, we need to solve the following three problems: the specific form of the model, the lag of time series variables and the structure of random disturbance term.

Several unit root tests (ADF, kpss, PP) and comparative analysis of R language time series stationarity

μ It is the mean value of YT; ψ Is the coefficient, which determines the linear dynamic structure of the time series, also known as the weight ψ 0=1;{ ε t} It is a Gaussian white noise sequence, which means that new information appears in the time series {YT} at time t, so it can be used to analyze the time series {YT} ε T is called innovation (new information) or shock (disturbance) of time t.

Unit root test is a special method of stationarity test. Unit root test is the basis of ARMA model, ARIMA model, cointegration analysis and causality test.

For unit root tests, to illustrate the implementation of these tests, consider the following series

 > plot(X,type="l")

Several unit root tests (ADF, kpss, PP) and comparative analysis of R language time series stationarity

  • Dickey Fuller (standard)

Here, for a simple version of the Dickey fuller test, we assume that

Several unit root tests (ADF, kpss, PP) and comparative analysis of R language time series stationarity

We want to test whether (or not). We can write the previous expression as

Several unit root tests (ADF, kpss, PP) and comparative analysis of R language time series stationarity

So we only need to test whether the regression coefficient in linear regression is empty. This can be done by Student t-test. If we consider that the previous model has no linear drift, we must consider the following regression

 Call:
lm(formula = z.diff ~ 0 + z.lag.1)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.84466 -0.55723 -0.00494  0.63816  2.54352 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)
z.lag.1 -0.005609   0.007319  -0.766    0.444

Residual standard error: 0.963 on 238 degrees of freedom
Multiple R-squared:  0.002461,    Adjusted R-squared:  -0.00173 
F-statistic: 0.5873 on 1 and 238 DF,  p-value: 0.4442

Our test program will be based on the value of student’s t-test,

> summary(lm(z.diff~0+z.lag.1 ))$coefficients[1,3]
[1] -0.7663308

This is the value used in the calculation

ur.df(X,type="none",lags=0)


############################################################### 
# Augmented Dickey-Fuller Test Unit Root / Cointegration Test # 
############################################################### 

The value of the test statistic is: -0.7663

Critical values (99%, 95%, 90%) can be used to explain this value

> qnorm(c(.01,.05,.1)/2)
[1] -2.575829 -1.959964 -1.644854

If the statistics exceed these values, then the sequence is not stationary, because we can’t reject such an assumption. So we can conclude that there is a unit root. In fact, these thresholds are determined by

 ############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression none 

Call:
lm(formula = z.diff ~ z.lag.1 - 1)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.84466 -0.55723 -0.00494  0.63816  2.54352 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)
z.lag.1 -0.005609   0.007319  -0.766    0.444

Residual standard error: 0.963 on 238 degrees of freedom
Multiple R-squared:  0.002461,    Adjusted R-squared:  -0.00173 
F-statistic: 0.5873 on 1 and 238 DF,  p-value: 0.4442

Value of test-statistic is: -0.7663 

Critical values for test statistics: 
      1pct  5pct 10pct
tau1 -2.58 -1.95 -1.62

R has several packages that can be used for unit root testing.

 Augmented Dickey-Fuller Test

data:  X
Dickey-Fuller = -2.0433, Lag order = 0, p-value = 0.5576
alternative hypothesis: stationary

There is also a test for the null hypothesis that there is a unit root. But the value of P is completely different.

p.value
[1] 0.4423705
testreg$coefficients[4]
[1] 0.4442389

  • AugmentationDickey fuller test

There may be some lag in regression. For example, we can consider

Several unit root tests (ADF, kpss, PP) and comparative analysis of R language time series stationarity

Again, we need to check if a coefficient is zero. This can be done with the student t-test.

 > summary(lm(z.diff~0+z.lag.1+z.diff.lag ))

Call:
lm(formula = z.diff ~ 0 + z.lag.1 + z.diff.lag)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.87492 -0.53977 -0.00688  0.64481  2.47556 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
z.lag.1    -0.005394   0.007361  -0.733    0.464
z.diff.lag -0.028972   0.065113  -0.445    0.657

Residual standard error: 0.9666 on 236 degrees of freedom
Multiple R-squared:  0.003292,    Adjusted R-squared:  -0.005155 
F-statistic: 0.3898 on 2 and 236 DF,  p-value: 0.6777

coefficients[1,3]
[1] -0.7328138

This value is used to

> df=ur.df(X,type="none",lags=1)

############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression none 

Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.87492 -0.53977 -0.00688  0.64481  2.47556 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
z.lag.1    -0.005394   0.007361  -0.733    0.464
z.diff.lag -0.028972   0.065113  -0.445    0.657

Residual standard error: 0.9666 on 236 degrees of freedom
Multiple R-squared:  0.003292,    Adjusted R-squared:  -0.005155 
F-statistic: 0.3898 on 2 and 236 DF,  p-value: 0.6777

Value of test-statistic is: -0.7328 

Critical values for test statistics: 
      1pct  5pct 10pct
tau1 -2.58 -1.95 -1.62

Similarly, you can use other packages:

 Augmented Dickey-Fuller Test

data:  X
Dickey-Fuller = -1.9828, Lag order = 1, p-value = 0.5831
alternative hypothesis: stationary

The conclusion is the same (we should reject the assumption that the sequence is stationary).

  • With trend and driftAugmentationDickey fuller test

So far, drift has not been included in our model. But it’s very simple (this will be called an extended version of the previous process): we just need to include a constant in the regression,

> summary(lm)


Residuals:
     Min       1Q   Median       3Q      Max 
-2.91930 -0.56731 -0.00548  0.62932  2.45178 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.29175    0.13153   2.218   0.0275 *
z.lag.1     -0.03559    0.01545  -2.304   0.0221 *
z.diff.lag  -0.01976    0.06471  -0.305   0.7603
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9586 on 235 degrees of freedom
Multiple R-squared:  0.02313,    Adjusted R-squared:  0.01482 
F-statistic: 2.782 on 2 and 235 DF,  p-value: 0.06393

Considering some analysis of variance output, we obtain the statistics of interest here, in which the model is compared with the model without integration part, and drift,

> summary(lmcoefficients[2,3]
[1] -2.303948
> anova(lm$F[2]
[1] 2.732912

These two values are also passed through

ur.df(X,type="drift",lags=1)

############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression drift 

Residuals:
     Min       1Q   Median       3Q      Max 
-2.91930 -0.56731 -0.00548  0.62932  2.45178 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.29175    0.13153   2.218   0.0275 *
z.lag.1     -0.03559    0.01545  -2.304   0.0221 *
z.diff.lag  -0.01976    0.06471  -0.305   0.7603
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9586 on 235 degrees of freedom
Multiple R-squared:  0.02313,    Adjusted R-squared:  0.01482 
F-statistic: 2.782 on 2 and 235 DF,  p-value: 0.06393

Value of test-statistic is: -2.3039 2.7329 

Critical values for test statistics: 
      1pct  5pct 10pct
tau2 -3.46 -2.88 -2.57
phi1  6.52  4.63  3.81

We can also include a linear trend,

> temps=(lags+1):n
lm(z.diff~1+temps+z.lag.1+z.diff.lag )

Residuals:
     Min       1Q   Median       3Q      Max 
-2.87727 -0.58802 -0.00175  0.60359  2.47789 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.3227245  0.1502083   2.149   0.0327 *
temps       -0.0004194  0.0009767  -0.429   0.6680
z.lag.1     -0.0329780  0.0166319  -1.983   0.0486 *
z.diff.lag  -0.0230547  0.0652767  -0.353   0.7243
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9603 on 234 degrees of freedom
Multiple R-squared:  0.0239,    Adjusted R-squared:  0.01139 
F-statistic:  1.91 on 3 and 234 DF,  p-value: 0.1287

> summary(lmcoefficients[3,3]
[1] -1.98282
> anova(lm$F[2]
[1] 2.737086

The R function returns

ur.df(X,type="trend",lags=1)


############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression trend 



Residuals:
     Min       1Q   Median       3Q      Max 
-2.87727 -0.58802 -0.00175  0.60359  2.47789 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.3227245  0.1502083   2.149   0.0327 *
z.lag.1     -0.0329780  0.0166319  -1.983   0.0486 *
tt          -0.0004194  0.0009767  -0.429   0.6680
z.diff.lag  -0.0230547  0.0652767  -0.353   0.7243
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9603 on 234 degrees of freedom
Multiple R-squared:  0.0239,    Adjusted R-squared:  0.01139 
F-statistic:  1.91 on 3 and 234 DF,  p-value: 0.1287

Value of test-statistic is: -1.9828 1.8771 2.7371 

Critical values for test statistics: 
      1pct  5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2  6.22  4.75  4.07
phi3  8.43  6.49  5.47

  • Kpss inspection

Here, in the kpss process, two models can be considered: drift model or linear trend model. Here, the zero hypothesis is that the sequence is stationary.
The code is

ur.kpss(X,type="mu")

####################### 
# KPSS Unit Root Test # 
####################### 

Test is of type: mu with 4 lags. 

Value of test-statistic is: 0.972 

Critical value for a significance level of: 
                10pct  5pct 2.5pct  1pct
critical values 0.347 0.463  0.574 0.73

In this case, there is a trend

ur.kpss(X,type="tau")

####################### 
# KPSS Unit Root Test # 
####################### 

Test is of type: tau with 4 lags. 

Value of test-statistic is: 0.5057 

Critical value for a significance level of: 
                10pct  5pct 2.5pct  1pct
critical values 0.119 0.146  0.176 0.216

Again, you can use another package to get the same validation (but also, different output)

 KPSS Test for Level Stationarity

data:  X
KPSS Level = 1.1997, Truncation lag parameter = 3, p-value = 0.01

> kpss.test(X,"Trend")

    KPSS Test for Trend Stationarity

data:  X
KPSS Trend = 0.6234, Truncation lag parameter = 3, p-value = 0.01

At least there’s consistency, because we’ve been rejecting assumptions.

  • Philipps Perron test

Philipps Perron test is based on ADF process. code

> PP.test(X)

    Phillips-Perron Unit Root Test

data:  X
Dickey-Fuller = -2.0116, Truncation lag parameter = 4, p-value = 0.571

Another possible alternative is

> pp.test(X)

    Phillips-Perron Unit Root Test

data:  X
Dickey-Fuller Z(alpha) = -7.7345, Truncation lag parameter = 4, p-value
= 0.6757
alternative hypothesis: stationary

  • compare

I won’t spend more time comparing different code, running these tests in R. Let’s take a moment to quickly compare the three methods. Let’s generate some autoregressive processes with more or less autocorrelation, and some random walks. Let’s see how these tests are performed

 > for(i in 1:(length(AR)+1)
+ for(s in 1:1000){
+ if(i!=1) X=arima.sim
+ M2[s,i]=(pp.testp.value)
+ M1[s,i]=(kpss.testp.value)
+ M3[s,i]=(adf.testp.value)
+ }

Here, we need to calculate the number of times the p value of the test exceeds 5%,

 > plot(AR,P[1,],type="l",col="red",ylim=c(0,1)
> lines(AR,P[2,],type="l",col="blue")
> lines(AR,P[3,],type="l",col="green")

Several unit root tests (ADF, kpss, PP) and comparative analysis of R language time series stationarity

We can see here how unstable the Dickey fuller test is because 50% (at least) of our autoregressive process is considered to be nonstationary.

Several unit root tests (ADF, kpss, PP) and comparative analysis of R language time series stationarity

Most popular insights

1.Matlab Markov chain Monte Carlo method (MCMC) to estimate stochastic volatility (SV) model

2.Threshold selection method of adaptive kernel density estimation in disease mapping based on R language

3.WinBUGS for multivariate stochastic volatility model: Bayesian estimation and model comparison

4.Hosmer lemeshow goodness of fit test in R language regression

5.Markov switching arma-garch model estimation of MCMC based on MATLAB

6.Regression analysis of interval data in R language

7.R language Wald test vs likelihood ratio test

8.Predicting stock price with linear regression in Python

9.How does R language calculate IDI and NRI in survival analysis and Cox regression