R language time-varying vector autoregressive (tv-var) model for time series analysis and visualization

Time：2021-11-17

In psychological research, the model of individual subject is becoming more and more popular. One reason is that it is difficult to infer personal processes from data between people. Another reason is that due to the ubiquity of mobile devices, more and more time series are obtained from individuals. The main goal of the so-called personal model modeling is to explore the potential changes of internal psychological phenomena. Considering this goal, many researchers have begun to analyze the multivariable dependence in individual time series. For this dependency, the simplest and most popular model is the first-order vector autoregressive (VaR) model, in which each variable at the current time point is predicted by all variables (including itself) at the previous time point (linear function).

A key assumption of the standard VAR model is that its parameters do not change with time. However, people are often interested in this change over time. For example, people may be interested in the relationship between changes in parameters and other variables, such as changes in a person’s environment. It may be a new job, season, or the impact of a global pandemic. In an exploratory design, one can study the impact of certain interventions (such as drug treatment or treatment) on the interaction between symptoms.

In this blog post, I briefly introduce how to use kernel smoothing method to estimate time-varying VAR model. This method is based on the assumption that parameters can change smoothly over time, which means that parameters cannot “jump” from one value to another. Then, I focus on how to estimate and analyze this type of time-varying VAR model.

Estimation of time variant model by kernel smoothing

The core idea of kernel smoothing method is as follows. We select time points with equal intervals in the duration of the whole time series, and then estimate the “local” model at each time point. All local models together constitute a time-varying model. For “local” models, we mean that these models are mainly based on time points close to the research time point. This is achieved by weighting the observed values in the process of parameter estimation. This idea is illustrated in the figure below for a dataset. Here we only show the estimation of the local model when t = 3. We can see 10 time points of this time series on the left panel. Red column w\_ t\_ E = 3 means that we estimate a set of weights that may be used by the local model when t = 3: the data at the time point close to t = 3 get the highest weight, while the further time point gets smaller and smaller weight. The functions that define these weights are shown on the right. The blue column on the left and the corresponding blue function on the right represent another possible weighting. Using this weighting, we combine fewer time close observations. This allows us to detect more “time variability” in the parameters because we smooth fewer time points. However, on the other hand, we use less data, which makes our estimates less reliable. Therefore, it is important to choose a weighting function to strike a good balance between the sensitivity to “time variability” and the stable estimation. In the method introduced here, we use a Gaussian weighting function (also known as kernel), which is defined by its standard deviation (or bandwidth). We will discuss how to select a good bandwidth parameter below.

To illustrate the estimated time-varying VAR model, I used the ESM time series of 12 emotion related variables, which were measured up to 10 times a day for 238 consecutive days. These questions are “I feel relaxed”, “I feel depressed”, “I feel irritable”, “I feel satisfied”, “I feel lonely”, “I feel anxious”, “I feel enthusiastic”, “I doubt”, “I feel happy”, “I feel guilty”, “I feel hesitant” and “I feel strong”. Each question was answered with a 7-point Likert scale, ranging from “no” to “very”.

We see 1476 observation samples in the data set:

## \[1\] 1476   12 time_ Data contains time information for each measurement. We will use the date when the measurement occurred, the measurement prompt, and the time stamp. Select the best bandwidth

One of the methods to select good bandwidth parameters is to fit time-varying models with different candidate bandwidth parameters on the training data set, and evaluate their prediction errors on the test data set. In addition, data-driven bandwidth selection may take quite a long time to run. Therefore, in this article, we just fix the bandwidth to the best value that has been selected.

bandwidth <- .26

Estimating time-varying VAR model

We can now specify the estimation of VAR model with time variation. We provide data as input and specify the type of variables and how many categories they have through the type and level parameters. In our example, all variables are continuous, so we set type = Rep (“g”) to represent continuous Gaussian. We choose the cross validation method of lambdasel = “CV” to select the regularization parameters, and we specify that the VAR model should include a single lag with lag = 1. The parameters bee and day provide the date of each measurement and the number of notifications for a given day. In addition, we also provide the timestamp of all measurements, time point = time, to illustrate the missing measurements. Note, however, that we still assume a lag size of 1. The timestamp is only used to ensure that the weighting does give the highest weight to those time points closest to the current estimation point.

For the time-varying model, we need to specify two additional parameters. First, use   Seq (0,1, length = 20) we specify that we want to estimate 20 local models (normalized to [0,1]) over the duration of the entire time series. The number of estimation points can be selected arbitrarily, but in some cases, adding more estimation points means increasing unnecessary computational cost, because the subsequent local models are basically the same. Finally, we specify the bandwidth with the bandwidth parameter.

#   Estimation model of complete data set
tvvar(data,
type = rep("g"),
lambdaSel = "CV",
lags = 1,
estpoints = seq(0, 1, length = 20),
bandwidth = bandwidth,

We can output objects

#   Check how much data is used
obj It provides a summary of the model and also shows the number of rows in the VaR design matrix (876) and the number of points in time in the dataset (1476). The number of the former is small, because the VaR (1) model can be estimated only when there is a time lag of 1 year at a given time point.

Calculate time-varying prediction error

Similar to the standard VAR model, we can calculate the prediction error. The prediction error of new samples can be calculated by providing new data and variables from the model object.

The parameter errorcon = C (“R2”, “RMSE”) specifies the proportion of the explained variance (R ^ 2) and the root mean square error (RMSE) as the prediction error. The last parameter method specifies how to calculate the prediction error of time change. Option method = “closestmodel” uses the closest local model to predict a time point. The option tvmethod = “weighted” selected here provides the weighted average value of all local model predictions, which is weighted by the weighting function centered on the position of the current time point. Generally, the results obtained by the two methods are very similar.

pred_obj <- predict(object = obj,
data = data,
errorCon = c("R2", "RMSE"),
Method = "weighted")

The main output is the following two objects.
Tverrors is a list, including the estimation errors of the local model at each estimation point; Errors contains the average error of the entire estimation point.  Visualize parts of the model

Here, we choose two different visualization methods. First, let’s check the VaR interaction parameters of estimation points 1, 10 and 20.

for(tp in c(1,10,20))igraph(wadj\[, , 1,tp  \],
layout = "circle",
Paste0 ("estimated point  = "))   We can see that some parameters in the VAR model change greatly over time. For example, the autocorrelation effect of “Relaxation” seems to decrease over time, the positive effect of “strong” on “satisfaction” only appears at estimate point 20, and the negative effect of “satisfaction” on “guilt” only appears at estimate point 20.

We can magnify these individual parameters by plotting them as a function of time.

#   Drawing
title(xlab  =  "Estimated point",   cex.lab  =  1.2)
title(ylab  =  "Parameter estimation",   cex.lab  =  1.2)

for(i in 1:nrow(display)) {
lines(1:20, ests\[par_row\[1\], \], lty = i)

legend<-   C (expression ("easy" \ ["T-1" \]   %->%   "Easy" \ ["t" \],
Expression ("strong" \ ["T-1" \]   %->%   "Satisfied" \ ["t" \],
Expression ("satisfied" \ ["T-1" \]   %->%   "Shame" \ ["t" \]) We can see that at the beginning of the time series, “Relaxation” has a strong impact on itself, but then it will fall to zero and remain zero around the estimation point 13. The cross lag effect of “strong” to “satisfactory” to the next time point is equal to zero before estimate point 9, but then seems to increase monotonically. Finally, “satisfaction” versus “shame”  ” The cross lag effect is also equal to zero until near the estimation point 13, and then decreases monotonically.

Stability of estimation

Similar to the standard model, the bootstrap sampling distribution can be used to evaluate the stability of time-varying parameters.

Is there a time change?

In some cases, it may be necessary to determine whether the parameters of a VAR model have reliable time variability. In order to make such a decision, we can use a hypothesis test, whose original assumption is that the model does not have time variability. Here is a way to test this hypothesis. Firstly, the data are fitted with the standard VAR model, and then the data of the estimation model are simulated repeatedly. For each simulated time series data set, we calculate the set prediction error of the time-varying model. The distribution of these prediction errors can be used as the sampling distribution of prediction errors under the original hypothesis. Now we can calculate the set estimation error of time-varying VAR model on empirical data and take it as a test statistic.

summary

In this paper, I show how to estimate a time-varying VAR model by kernel smoothing method, which is based on the assumption that all parameters are smoothing functions of time. In addition to estimating the model, we also discussed the selection of appropriate bandwidth parameters, how to calculate (time-varying) prediction errors, and how to visualize different aspects of the model. Finally, it introduces how to evaluate the stability of estimates by bootstrap method and how to test hypotheses. People can use it to select standard and time-varying VaR models. Most popular insights

Redis data structure – Dictionary

preface Dictionaries are widely used in redis. The underlying implementation of databases and hash objects is dictionaries. 1、 Review hash table 1.1 hash table The idea of hash table (hash table) is mainly based on the feature that array supports random access to data according to subscript, and the time complexity is O (1). It […]