Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Time:2022-5-24

Original link:http://tecdat.cn/?p=24814

When it comes to making money in the stock market, there are countless different ways to make money. It seems that in the financial world, wherever you go, people are telling you that you should learn python. After all, Python is a popular programming language that can be used in all types of fields, including data science. There are a number of software packages that can help you achieve your goals, and many companies use Python to develop data centric applications and scientific computing related to the financial community.

Most importantly, python can help us take advantage of many different trading strategies that (without it) will be difficult to analyze by hand or spreadsheet. One of the trading strategies we will discuss is calledPaired transactions.

Pairing transaction

Matching transactions are_ Mean regression_ A form that has the unique advantage of always hedging against market fluctuations. The strategy is based on mathematical analysis.

The principle is as follows. Suppose you have a pair of securities X and y with some potential economic connection. An example might be two companies that produce the same product, or two companies in a supply chain. If we can use mathematical models to model this economic connection, we can trade it.

To understand pairing transactions, we need to understand three mathematical concepts:Stationarity, difference and Cointegration

import numpy as np
import pandas as pd

Stationary / nonstationary

Stationarity is the most common untested hypothesis in time series analysis. When the parameters of the data generation process do not change with time, we usually assume that the data is stable. Or consider two series: A and B. Series a will generate a stationary time series with fixed parameters, while B will change with time.

We will create a function to create a Z-score for the probability density function. The probability density of Gaussian distribution is:

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketIs the mean sumPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketIs the standard deviation. Square of standard deviationPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market, is variance. Rule of thumb states that 66% of the data should be betweenPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketAndPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market, which means that the functionnormalIt is more likely to return samples close to the mean than those far from the mean.

    mu 
    sigma 
    return normal(mu, sigma )

From there, we can create two graphs showing stationary and non-stationary time series.

#Set parameters and data points
T = 100

Series(index=range(T))


     #Now the parameters depend on time
     #Specifically, the mean value of the sequence changes with time
     B\[t\] = genedata
    
plt.subplots

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Why is stability important

Many statistical tests require the tested data to be stable. Using some statistics on non-stationary data sets may lead to garbage results. As an example, let’s pass through our nonstationaryPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market.

np.mean

plt.figure
plt.plot
plt.hlines

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

The calculated average will show the average of all data points, but any prediction of the future state is useless. Compared with any specific time, it is meaningless because it is a collection of different states at different times. This is just a simple and clear example of why non stationarity distorts the analysis, and more subtle problems will appear in practice.

Stationarity test enlarged Dickey Fuller (ADF)

To test stationarity, we need to test a called_ Unit root_ Things. The autoregressive unit root test is based on the following hypothesis tests:

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

It is called the unit root tet because under the original assumption, autoregressive polynomialsPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market, the root of is equal to 1.Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketUnder the original assumption, the trend is stable. IfPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketThen we first make a difference, which becomes:

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

The test statistic isPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketLeast squares estimation and Se(Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market)Is the usual standard error estimate. This test is a unilateral left tail test. If{Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market}Is stable, then it can be provedPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketperhapsPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketAnd yesPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketHowever, under the original assumption of non stationarity, the above results are givenPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketThe following function will allow us to check stationarity using the augmented Dickey Fuller (ADF) test.

defty_test(X, cutoff=0.01):
     #Fuller in adller_ 0 ¢ is the existence of unit root (non-stationary)
     #We must observe the significant P , value to see that the sequence is stable
     adfuller

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

As we can see, based on the test statistics of time series a (corresponding to a specific p value), we may not be able to reject the original hypothesis. Therefore, the a series is likely to be stationary. On the other hand, B series is rejected by hypothesis test, so this time series is likely to be non-stationary.

Cointegration

The correlation between financial volumes is notoriously unstable. Nevertheless, correlation is often used in almost all diversified financial problems. Another statistical measure of correlation is cointegration. This may be a more robust measure of the link between two financial quantities, but so far, there is little deviation theory based on this concept.

The two stocks may be completely related in the short term, but there are differences in the long run, one growing and the other falling. On the contrary, the two stocks may follow each other, and the distance will not exceed a certain distance, but they are correlated, with positive and negative correlation changes. If we are short-term, correlation may be important, but it doesn’t matter if we hold stocks in the portfolio for a long time.

We have constructed two examples of cointegration sequences. We now draw the difference between the two.

#Generate daily revenue

np.random.normal

#Summary


plot

np.random.normal
Y = x + 6 + noise

plt.show()

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

(Y - X). Plot # plot point difference
plt. Axhline# add average
plt.xlabel
plt.xlim

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Cointegration test

Steps of cointegration inspection procedure:

  1. Check the unit root of each component seriesPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketUse univariate unit root test alone, such as ADF and PP test.
  2. If the unit root cannot be rejected, the next step is to test the cointegration relationship between components, that is, to test whether it is true or notPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketIs I (0).

If we find that the time series is the unit root, then we continue the cointegration process. There are three main cointegration tests: Johansen, Engle Granger and Phillips ouliaris. We will mainly use Engle Granger test.

Let’s consider the regression modelPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market:

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketinPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketIs a deterministic term. The hypothesis test is as follows:

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketAnd_ Normalized cointegration vector Cointegration_

 Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

We also use residualsPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketUsed for unit root inspection.

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

This hypothesis test applies to the model:

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Test statistics for the following equation:Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Now that you understand the meaning of two time series cointegration, we can test it and measure it with Python:

coint
print(pvalue)


#Low P value means high cointegration!

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Correlation and Cointegration

Correlation and cointegration are similar in theory, but they are completely different. To prove this, we can look at two examples of related but non cointegration time series.

A simple example is two sequences.

Xruns = np.random.normal
yrurs = np.random.normal



pd.concat
plt.xlim

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Next, we can output the correlation coefficientPython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market, and cointegration test

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

As we can see, there is a very strong correlation between sequences X and y. However, the p value of our cointegration test produces 0.7092, which means that there is no cointegration between time series X and y.

Another example of this is the normal distribution series and the square wave.

Y2 = pd.Series



plt.figure
Y2.plot()

#The correlation is almost zero

prinr(pvle))

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

   

Although the correlation is very low, the p value indicates that these time series are cointegrated.

import fix_yaance as yf
yf.pdrde

Data Science in trading

Before I begin, I will first define a function that can easily find cointegration pairs using the concepts we have covered.

def fitirs(data):
    n = data.shape
    srmaix = np.zeros
    pvl_mrix = np.ones
    keys = dta.keys 
    for i in range(n):
        for j in range:
          
            reut = coint 
            sr = ret\[0\]
            paue = rsult\[1\]
            soeix\[i, j\] = score
            pu_trix\[i, j\] = palue
            if palue < 0.05:
                pairs.append
    return soe_mati, prs

We are looking at a group of technology companies to see if any of them are cointegrated. We will first define the list of securities we want to view. Then we will get the pricing data of each security from 2013 to 2018

We want to test whether there is a certain relationship between the securities industry, that is, whether there is a certain relationship within the securities industry. This results in a much smaller multiple comparison bias than searching for hundreds of securities, and slightly more than forming assumptions for a single test.

start = datetime.datetime
end = datetime.datetime




df = pdr(tcrs, strt, nd)\['Close'\]
df.tail()

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

 Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

#The heat map shows the P # value of the cointegration test between each pair of stocks. Only the values on the diagonal of the heat map are displayed
Score

seaborn.heatmap

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Our algorithm lists two cointegration pairs: AAPL / eBay and ABDE / MSFT. We can analyze their patterns.

coit
pvalue

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

As we can see, the p value is less than 0.05, which means that ADBE and MSFT # are indeed cointegration pairs.

Calculate price difference

Now we can plot the price difference between the two time series. In order to actually calculate the spread, we use linear regression to obtain the coefficients of the linear combination between our two securities, as mentioned earlier in the Engel Granger method.

results.params

sed = S2 - b * S1
sedplot
plt.axhline
plt.xlim
plt.legend

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Alternatively, we can check the ratio between two time series

rio
rao.plot
plt.axhline
plt.xlim
plt.legend

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Whether we use the spread method or the ratio method, we can see that our first graph for ADBE / SYMC tends to move around the mean. We now need to standardize this ratio, because absolute ratio may not be the best way to analyze this trend. To do this, we need to use z-scores.

The Z-score is the standard deviation of the data points from the mean. More importantly, the number of standard deviations above or below the overall mean comes from the original score. The calculation method of Z-score is as follows:

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

def zscr:
    return (sres - ees.mean) / np.std


zscr.plot
plt.axhline
plt.axhline
plt.axhline
plt.xlim
plt.show

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

By placing the other two lines at z scores 1 and – 1, it is clear that in most cases any large deviation from the average will eventually converge. This is exactly the matching trading strategy we want.

Transaction signal

When conducting any type of trading strategy, it is always important to clearly define and describe the time point at which the actual transaction is conducted. For example, what is the best indicator that I need to buy and sell specific stocks?

Set rules

We will use the ratio time series we created to see if it tells us whether to buy or sell at a specific time. We will first create a predictive variablePython matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock marketIf the ratio is positive, it means “buy”, otherwise it means sell. The prediction model is as follows:

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

The advantage of paired trading signals is that we don’t need to know the absolute information of the price trend, we just need to know its trend: up or down.

Split training test

When training and testing models, there is usually 70 / 30 or 80 / 20 segmentation. We only used a time series of 252 points (this is the number of trading days in a year). Before training and splitting the data, we will add more data points in each time series.

ratios = df\['ADBE'\] / df\['MSFT'\] 
print(len(ratios) * .70 )

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

tran = ratos\[:881\]
tet = rats\[881:\]

Characteristic Engineering

We need to find out which features are actually important in determining the direction in which the ratio moves. Knowing that the ratio will eventually return to the mean, perhaps moving averages and indicators related to the mean will be important.

Let’s try:

  • 60 day moving average
  • 5-day moving average
  • 60 day standard deviation
  • Z score
train.rolg
zcoe\_5 = (ra\_ag5 - rasag60)/
plt.figure
plt.plot
plt.legend
plt.ylabel
plt.show

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

plt.figure
z5.plot()
plt.xlim
plt.axhline
plt.legend
plt.show

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Create model

The mean of standard normal distribution is 0 and the standard deviation is 1. As can be seen from the figure, it is obvious that if the time series exceeds the mean by 1 standard deviation, it tends to return to the mean. Using these models, we can create the following transaction signals:

  • Whenever the Z-score is below – 1, buy (1), which means we expect the ratio to increase.
  • Whenever the Z score is higher than 1, sell (- 1), which means that we expect the ratio to decline.

Training optimization

We can use our model on actual data

train.plot()
buy 
sell
buy\[z>-1\] = 0
sell\[z5<1\] = 0
buy\[160:\].plot
sell\[160:\].plot

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

plt.figure


#When you buy the ratio, you buy the stock # S1 # and sell # S2

sell\[buy!=0\] = S\[uy!=0\]

#When you sell the ratio, you sell the stock # S1 # and buy # S2

sell\[sll!=0\] = S1\[sll!=0\]

BuR\[60:\].plot
selR\[60:\].plot

 Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Now we can clearly see when we should buy or sell the corresponding stocks.

Now, how much can we expect from this strategy?

#Use the simple # strydef # to trade:
    
    #If the window length is 0, the algorithm is meaningless and exits
   
    
    #Calculate the rolling average and rolling standard deviation
    Ratio = S1 / s2
    a1 = rais.rolng
    zscoe = (ma1 - ma2)/std
    
    #Simulated transaction

    #For ^ I (len (ratios)) in the range:
        #If # Z-score > 1, sell short
      
            mey += S1\[i\] - S2\[i\] * rts\[i\]
         
            cutS2 += raos\[i\]
          
        #If Z-score < - 1, buy long
        ef zoe\[i\] > 1:
            mey  -= S1\[i\] - S2\[i\] * rtos\[i\]
    
        #Clear if # Z-score # is between -. 5 # and. 5 #
        elif abs(zcre\[i\]) < 0.75:
            mey  += S1\[i\] * ctS + S2\[i\] * oS2
trad

Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

This is a good profit for the strategy formulated from the strategy.

Areas for improvement and further steps

This is by no means a perfect strategy, and the implementation of our strategy is not the best. However, there are several things that can be improved.

1. Use more securities and a more diversified time horizon

For the cointegration test of paired trading strategy, I only used a few stocks. Naturally (and in practice), it is more effective to use clusters within the industry. I only used a time frame of only five years, which may not represent the fluctuation of the stock market.

2. Processed fitting

Anything related to data analysis and training model has a lot to do with the over fitting problem. There are many different methods to deal with over fitting such as validation, such as Kalman filter and other statistical methods.

3. Adjust trading signals

Our trading algorithm does not take into account the overlapping and overlapping stock prices. Considering that the code only asks to buy or sell according to its ratio, it does not consider which stock is actually higher or lower.

4. More advanced methods

This is just the tip of the iceberg of the algorithm for trading. This is simple because it only deals with moving averages and ratios. If you want to use more complex statistics, use. Other complex examples include topics such as Hurst index, half-life mean regression, and Kalman filter.


Python matching trading strategy pairs trading statistical arbitrage quantitative trading analysis stock market

Most popular insights

1.Arima + GARCH trading strategy for S & P500 stock index in R language

2.Analysis of stock matching trading strategy improved by R language spy-tlt portfolio and Chinese stock market portfolio

3.R language time series: application of trading strategy of ARIMA GARCH model in foreign exchange market prediction

4.R language implementation of TMA triple average futures high frequency trading strategy

5.Back test comparison of R language multi mean square quantization strategies

6.An example of using R language to realize neural network to predict stock

7.Implementation of R language to predict Volatility: arch model and har-rv model

8.How to make Markov Switching Model in R language

9.Matlab uses copula simulation to optimize market risk