Source: vitu.ai

## （1） What is matchmaking?

Statistical arbitrage matching trade is a trading strategy based on mathematical analysis. Its profit model is obtained by two underlying spread. Although the price trend of the two will deviate in the middle of the process, they will eventually tend to be the same.

The matching trade is to use this price deviation to gain profits. The two subjects with this relationship are statistically called cointegration, that is, the difference between them will swing around a certain mean value, which is the basis of the profitability of the matching trading strategy.

Generally speaking, if there is strong co integration between two stocks or variables, their destinations are always the same no matter how they go.

## （2） What is cointegration?

The classical regression model is based on the stationary data variables. For the non-stationary variables, the classical regression model can not be used, otherwise there will be many problems such as false regression, but most of the time series in practical application are non-stationary.

The cointegration theory and method proposed by Engle and Granger in 1987 provide another way for the modeling of non-stationary series.

Although some economic variables are non-stationary, their linear combination may be stationary. This stable linear combination is called a cointegration equation, and can be interpreted as a long-term stable equilibrium relationship between variables.

Special attention should be paid to the fact that although cointegration and correlation are similar, they are actually two different things.

For example, two lines, y = x and y = 2x, have a correlation of 1, but the co integration is relatively poor;

The correlation between square wave signal and white noise signal is very weak, but they have strong cointegration. As shown in the figure below, their average is almost the same.

## （3） What is stationarity?

Here also introduces a basic concept of time series: stationarity.

There are two types of stability:

- Strong (strict) stationary: given a random process x (T), t belongs to t, and its finite dimensional distribution group is f (x1, X2,…) Xn; T1, T2,… ,tn）,t1,t2,… , TN belongs to t, for any T1, T2 , TN belongs to t, which satisfies T1 + H, T2 + H , TN + H belongs to h of T, there is always f (x1, X2,…) Xn; T1, T2,… ,tn）=F（x1,x2,… xn;t1+h,t2+h,… (tn+h);
- Wide (weak) stationary: given the second-order moment process (the second-order moment exists) x (T), t belongs to T. If the mean function U (T) of X (T) is constant, the correlation function R (T1, T2) = f (t2-t1), that is, the correlation function is only related to the time interval. (we usually use weak stability)

Single integer order: when the original sequence is non-stationary, the sequence needs to be differentiated (the latter term minus the former term), until it is a stationary sequence, and the difference several times is the order.

The co integration relationship exists only when the time series {x} and {y} of the two variables are of the same order and single integration sequence, i.e. I (d).

Therefore, before the co integration test of Y and X variables, ADF unit root test is used to test the stationarity of {x} and {y} time series. The common test methods of stationarity are graphical method and unit root test.

（4） Is there co integration between BTC and eth?

For example, in order to prove whether there is cointegration between two symbols, BTC / usdt and eth / usdt of a year’s time series are taken.

Daily closing price of BTC:

Reference source: vitu.ai

Daily closing price of eth:

Reference source: vitu.ai

First order the whole order number, that is, check the stability, make difference, until the sequence is stable.

Here, the ADF unit root test is used to test the stationarity. The original hypothesis is that the sequence has unit root, that is, it is non-stationary. For a stationary time series data, it needs to be significant at a given confidence level and reject the original hypothesis.

Reference source: vitu.ai

After the first-order difference, the two sequences are already stable. Their single integral order numbers are all one, so they are single integral and of the same order. We can do co integration as follows: the original assumption here is that there is no co integration relationship between them.

P value is lower than the critical value, so the original hypothesis is rejected, and there is a cointegration relationship between them.

Reference source: vitu.ai

（5） Can BTC and eth do matching transactions?

Next, we can make matching transactions according to the co integration relationship between the two. First draw the price difference sequence of the two:

Reference source: vitu.ai

In this way, the simplest matching trading strategy is as follows: (if futures can be operated)

When spread price is greater than 9920.7027, short selling spread, i.e. short selling BTC / usdt and buying eth / usdt.

When the spread price is less than 4481.4179, buy the difference, that is, buy BTC / usdt and short eth / usdt.

Spread price close to zero

The above process focuses on describing the theory for everyone to study. The results are not too important. (the following strategy is attached with the source code) futures backtesting is about to start. Maybe this way of statistical arbitrage can also be used in the futures market.

The road of quantification is long, and I will go up and down.

（6） Back test results

Reference source: vitu.ai

Original address: pairing transaction based on Cointegration of time series)