Time：2021-4-8

# Link to the original text:http://tecdat.cn/?p=20360

This paper will explain the R language in financial mathematics to optimize the portfolio, the implementation and use of factor model.

# Macroeconomic factor model with single market factor

We’ll start with a simple example of a single known factor, the market index. The model is The explicit factor ft is S & P 500 index. We will do a simple least square (LS) regression to estimate intercept α and loading β Most lines of code are used to prepare data rather than to perform factor modeling. Let’s start preparing the data:

``````#Set start and end dates and list of stock names
begin_date <- "2016-01-01"
end_date <- "2017-12-31"

#Download data from Yahoo Finance
data_set <- xts()
for (stock_index in 1:length(stock_namelist))
data_set <- cbind(data_set, Ad(getSymbols(stock_namelist[stock_index],
from = begin_date, to = end_date,
head(data_set)
#>                AAPL  AMD      ADI     ABBV AEZS        A       APD       AA       CF
#> 2016-01-04 98.74225 2.77 49.99239 49.46063 4.40 39.35598 107.89010 23.00764 35.13227
#> 2016-01-05 96.26781 2.75 49.62508 49.25457 4.21 39.22057 105.96097 21.96506 34.03059
#> 2016-01-06 94.38389 2.51 47.51298 49.26315 3.64 39.39467 103.38042 20.40121 31.08988
#> 2016-01-07 90.40047 2.28 46.30082 49.11721 3.29 37.72138  99.91463 19.59558 29.61520
#> 2016-01-08 90.87848 2.14 45.89677 47.77789 3.29 37.32482  99.39687 19.12169 29.33761
#> 2016-01-11 92.35001 2.34 46.98954 46.25827 3.13 36.69613  99.78938 18.95583 28.14919

head(SP500_index)
#>              index
#> 2016-01-04 2012.66
#> 2016-01-05 2016.71
#> 2016-01-06 1990.26
#> 2016-01-07 1943.09
#> 2016-01-08 1922.03
#> 2016-01-11 1923.67
plot(SP500_index)`````` ``````#The logarithmic returns of stock and SP500 index are calculated as explicit factors
X <- diff(log(data_set), na.pad = FALSE)
N < - ncol (x) # number of shares
T < - nrow (x) # days``````

Now we are ready to fit the factor model. Ls fitting is easy to implement in R, as follows: `````` beta <- cov(X,f)/as.numeric(var(f))
alpha <- colMeans(X) - beta*colMeans(f)
sigma2 <- rep(NA, N)

print(alpha)
#>              index
#> AAPL  0.0003999086
#> AMD   0.0013825599
#> ADI   0.0003609968
#> ABBV  0.0006684632
#> AEZS -0.0022091301
#> A     0.0002810616
#> APD   0.0001786375
#> AA    0.0006429140
#> CF   -0.0006029705
print(beta)
#>          index
#> AAPL 1.0957919
#> AMD  2.1738304
#> ADI  1.2683047
#> ABBV 0.9022748
#> AEZS 1.7115761
#> A    1.3277212
#> APD  1.0239453
#> AA   1.8593524
#> CF   1.5702493 ``````

Alternatively, we can use matrix representation for fitting We define And expansion factor ​。 Then minimize ``````t(X) %*% F_ %*% solve(t(F_) %*% F_)

#>              alpha      beta
#> AAPL  0.0003999086 1.0957919
#> AMD   0.0013825599 2.1738304
#> ADI   0.0003609968 1.2683047
#> ABBV  0.0006684632 0.9022748
#> AEZS -0.0022091301 1.7115761
#> A     0.0002810616 1.3277212
#> APD   0.0001786375 1.0239453
#> AA    0.0006429140 1.8593524
#> CF   -0.0006029705 1.5702493
E <- xts(t(t(X) - Gamma %*% t(F_ ））, index (x)) # residuals``````

In addition, we can simply use r to do the work for us

`````` cbind(alpha = factor_model\$alpha, beta = factor_model\$beta)
#>              alpha     index
#> AAPL  0.0003999086 1.0957919
#> AMD   0.0013825599 2.1738304
#> ADI   0.0003609968 1.2683047
#> ABBV  0.0006684632 0.9022748
#> AEZS -0.0022091301 1.7115761
#> A     0.0002810616 1.3277212
#> APD   0.0001786375 1.0239453
#> AA    0.0006429140 1.8593524
#> CF   -0.0006029705 1.5702493 ``````

## Visual covariance matrix

Interestingly, visualizing the logarithmic yield [arithmetic processing error] And the estimated covariance matrix of residual ψ. Let’s start with the covariance matrix of logarithmic returns

``Main ("covariance matrix of logarithmic return of single factor model")`` We can observe that all stocks are highly correlated, which is influenced by market factors. In order to check the stock correlation, we draw a correlation chart

``````plot(cov2cor(Psi),
Main = residual covariance matrix)`````` ``````cbind(stock_ namelist, sector_ Namelist) the industry of stocks
#>       stock_namelist sector_namelist
#>  [1,] "AAPL"         "Information Technology"
#>  [2,] "AMD"          "Information Technology"
#>  [3,] "ADI"          "Information Technology"
#>  [4,] "ABBV"         "Health Care"
#>  [5,] "AEZS"         "Health Care"
#>  [6,] "A"            "Health Care"
#>  [7,] "APD"          "Materials"
#>  [8,] "AA"           "Materials"
#>  [9,] "CF"           "Materials"``````

Interestingly, we can observe that the automatic clustering of ψ can correctly identify the industry of the stock.

# Evaluate investment funds

In this example, we will evaluate the performance of several investment funds based on factor model. We take the S & P 500 index as a clear market factor, and assume that the risk-free return is zero, RF = 0. In particular, we consider six Exchange Traded Funds:

First, we load the data

``````#Set start and end dates and list of stock names
begin_date <- "2016-10-01"
end_date <- "2017-06-30"

#Download data from Yahoo Finance
data_set <- xts()
for (stock_index in 1:length(stock_namelist))
data_set <- cbind(data_set, Ad(getSymbols(stock_namelist[stock_index],

head(data_set)
#>                 SPY   XIVH     SPHB     SPLV     USMV      JKD
#> 2016-10-03 203.6610 29.400 31.38322 38.55683 42.88382 119.8765
#> 2016-10-04 202.6228 30.160 31.29729 38.10687 42.46553 119.4081
#> 2016-10-05 203.5195 30.160 31.89880 38.02249 42.37048 119.9421
#> 2016-10-06 203.6610 30.160 31.83196 38.08813 42.39899 120.0826
#> 2016-10-07 202.9626 30.670 31.58372 37.98500 42.35146 119.8296
#> 2016-10-10 204.0197 31.394 31.87970 38.18187 42.56060 120.5978

head(SP500_index)
#>              index
#> 2016-10-03 2161.20
#> 2016-10-04 2150.49
#> 2016-10-05 2159.73
#> 2016-10-06 2160.77
#> 2016-10-07 2153.74
#> 2016-10-10 2163.66

#The logarithmic returns of stock and SP500 index are calculated as explicit factors
X <- diff(log(data_set), na.pad = FALSE)
N < - ncol (x) # number of shares
T < - nrow (x) # days``````

Now we can calculate alpha and beta of all ETFs

`````` #>              alpha      beta
#> SPY   7.142225e-05 1.0071424
#> XIVH  1.810392e-03 2.4971086
#> SPHB -2.422107e-04 1.5613533
#> SPLV  1.070918e-04 0.6777149
#> USMV  1.166177e-04 0.6511667
#> JKD   2.569578e-04 0.8883843 ``````

Some observations can now be made:

• Spy is an ETF of S & P 500. As expected, its alpha value is almost zero, and its beta value is almost 1: α = 7.142211 × 10-5 and β = 1.0071423.
• Xivh is an ETF with high alpha value, and the alpha value calculated is the highest (1-2 orders of magnitude higher): α = 1.810392 × 10-3.
• SPHB is an ETF, which is supposed to have a high beta, but the calculated beta is the highest, but not the highest: β = 1.5613531. Interestingly, the calculated alpha is negative, so the ETF should be cautious.
• Splv is an ETF to reduce volatility. In fact, the calculated beta is low: β = 0.6777072.
• Usmv is also an ETF that reduces volatility. In fact, the beta calculated is the lowest: β = 0.6511671.
• JKD shows a good compromise.

We can use some visualizations:

`` barplot(rev(alpha), horiz = TRUE, main = "alph`` We can also use Sharpe ratio to compare different ETFs more systematically. Review the factor model of one asset and one factor We get The Sharpe ratio is as follows: hypothesis ​。 Therefore, one way to rank different assets based on Sharpe ratio is to rank them according to α / β ratio

`````` print(ranking)
#>         alpha/beta         SR         alpha      beta
#> XIVH  7.249952e-04 0.13919483  1.810392e-03 2.4971086
#> JKD   2.892417e-04 0.17682677  2.569578e-04 0.8883843
#> USMV  1.790904e-04 0.12280053  1.166177e-04 0.6511667
#> SPLV  1.580189e-04 0.10887903  1.070918e-04 0.6777149
#> SPY   7.091574e-05 0.14170591  7.142225e-05 1.0071424
#> SPHB -1.551287e-04 0.07401566 -2.422107e-04 1.5613533 ``````

We can see that:

• As far as α / β is concerned, xivh is the best (α is the largest), while SPHB is the worst (α is negative).
• In terms of Sharpe ratio (or rather information ratio, because we ignore risk-free interest rate), JDK is the best, followed by spy. This confirms the view that most investment funds do not outperform the market.
• Obviously, by any measure, SPHB is the worst: negative α, negative β ratio and Sharpe ratio.
• JDK achieves the best performance because it has a good alpha value (though not the best) and a medium beta value of 0.88.
• Xivh and SPHB have a large number of different betas and therefore have extreme market exposure.
• Usmv has the lowest exposure rate in the market, has an acceptable alpha value, and its Sharpe ratio is close to the second and third highest positions.

# Fama French three factor model

This example will illustrate the Fama French three factor model using nine stocks in the S & P 500 index. Let’s start by loading the data:

``````#Set start and end dates and list of stock names
begin_date <- "2013-01-01"
end_date <- "2017-08-31"

#Download data from Yahoo Finance
data_set <- xts()
for (stock_index in 1:length(stock_namelist))
data_set <- cbind(data_set, Ad(getSymbols(stock_namelist[stock_index],

#Download Fama French factor

head(fama_lib)
#>            Mkt.RF   SMB   HML
#> 1926-07-01   0.10 -0.24 -0.28
#> 1926-07-02   0.45 -0.32 -0.08
#> 1926-07-06   0.17  0.27 -0.35
#> 1926-07-07   0.09 -0.59  0.03
#> 1926-07-08   0.21 -0.36  0.15
#> 1926-07-09  -0.71  0.44  0.56
tail(fama_lib)
#>            Mkt.RF   SMB   HML
#> 2017-11-22  -0.05  0.10 -0.04
#> 2017-11-24   0.21  0.02 -0.44
#> 2017-11-27  -0.06 -0.36  0.03
#> 2017-11-28   1.06  0.38  0.84
#> 2017-11-29   0.02  0.04  1.45
#> 2017-11-30   0.82 -0.56 -0.50

#Calculate the logarithmic return and Fama French factor of the stock
X <- diff(log(data_set), na.pad = FALSE)
N < - ncol (x) # number of shares``````

Now we have three factors in the matrix F and want to fit the model The current load is a beta matrix: ​。 We can do least square fitting, minimize ​。 More conveniently, we define And expansion factor ​。 The LS formula can then be written as minimized ``````print(Gamma)
#>              alpha        b1          b2          b3
#> AAPL  1.437845e-04 0.9657612 -0.23339130 -0.49806858
#> AMD   6.181760e-04 1.4062105  0.80738336 -0.07240117
#> ADI  -2.285017e-05 1.2124008  0.09025928 -0.20739271
#> ABBV  1.621380e-04 1.0582340  0.02833584 -0.72152627
#> AEZS -4.513235e-03 0.6989534  1.31318108 -0.25160182
#> A     1.146100e-05 1.2181429  0.10370898 -0.20487290
#> APD   6.281504e-05 1.0222936 -0.04394061  0.11060938
#> AA   -4.587722e-05 1.3391852  0.62590136  0.99858692
#> CF   -5.777426e-04 1.0387867  0.48430007  0.82014523 ``````

In addition, we can use r to complete:

``````#>              alpha    Mkt.RF         SMB         HML
#> AAPL  1.437845e-04 0.9657612 -0.23339130 -0.49806858
#> AMD   6.181760e-04 1.4062105  0.80738336 -0.07240117
#> ADI  -2.285017e-05 1.2124008  0.09025928 -0.20739271
#> ABBV  1.621380e-04 1.0582340  0.02833584 -0.72152627
#> AEZS -4.513235e-03 0.6989534  1.31318108 -0.25160182
#> A     1.146100e-05 1.2181429  0.10370898 -0.20487290
#> APD   6.281504e-05 1.0222936 -0.04394061  0.11060938
#> AA   -4.587722e-05 1.3391852  0.62590136  0.99858692
#> CF   -5.777426e-04 1.0387867  0.48430007  0.82014523 ``````

# Statistical factor model

Now let’s consider a statistical factor model or an implicit factor model, where both factors and loads are not available. Call the principal component method of the model XT = α 1t + BFT + et with K factor

1. PCA：

• Sample mean value: • Matrix: • Sample covariance matrix: • Feature decomposition: • Estimates:

• • • • Update feature decomposition • Repeat steps 2-3 until convergence.
• ``````#>              alpha
#> AAPL  0.0007074564 0.0002732114 -0.004631647 -0.0044814226
#> AMD   0.0013722468 0.0045782146 -0.035202146  0.0114549515
#> ADI   0.0006533116 0.0004151904 -0.007379066 -0.0053058139
#> ABBV  0.0007787929 0.0017513359 -0.003967816 -0.0056000810
#> AEZS -0.0041576357 0.0769496344  0.002935950  0.0006249473
#> A     0.0006902482 0.0012690079 -0.005680162 -0.0061507654
#> APD   0.0006236565 0.0005442926 -0.004229364 -0.0057976394
#> AA    0.0006277163 0.0027405024 -0.009796620 -0.0149177957
#> CF   -0.0000573028 0.0023108605 -0.007409061 -0.0153425661 ``````

Similarly, we can use r to do the work:

``````#>              alpha      factor1      factor2       factor3
#> AAPL  0.0007074564 0.0002732114 -0.004631647 -0.0044814226
#> AMD   0.0013722468 0.0045782146 -0.035202146  0.0114549515
#> ADI   0.0006533116 0.0004151904 -0.007379066 -0.0053058139
#> ABBV  0.0007787929 0.0017513359 -0.003967816 -0.0056000810
#> AEZS -0.0041576357 0.0769496344  0.002935950  0.0006249473
#> A     0.0006902482 0.0012690079 -0.005680162 -0.0061507654
#> APD   0.0006236565 0.0005442926 -0.004229364 -0.0057976394
#> AA    0.0006277163 0.0027405024 -0.009796620 -0.0149177957
#> CF   -0.0000573028 0.0023108605 -0.007409061 -0.0153425661 ``````

# The final comparison of covariance matrix estimation is carried out through different factor models

We will eventually compare the following different factor models:

• Sample covariance matrix
• Macroeconomic one factor model
• The basic three factor Fama French model
• Statistical factor model

We estimate the model in the training phase, and then compare the estimated covariance matrix with the sample covariance matrix in the test phase. The estimation error will be evaluated according to the primary (average loss increase percentage) Load training and test sets:

``````#Set start and end dates and list of stock names
begin_date <- "2013-01-01"
end_date <- "2015-12-31"

#Prepare stock data
data_set <- xts()
for (stock_index in 1:length(stock_namelist))
data_set <- cbind(data_set, Ad(getSymbols(stock_namelist[stock_index],

#Fama French factor
mydata <- mydata[-nrow(mydata),

#Preparation index
f_SP500 <- diff(log(SP500_index), na.pad = FALSE)

#Split the data into training data and test data
T_trn <- round(0.45*T)
X_trn <- X[1:T_trn, ]
X_tst <- X[(T_trn+1):T, ]``````

Now let’s use the training data to estimate different factor models:

``````#Sample covariance matrix
Sigma_SCM <- cov(X_trn)

#Single factor model
Gamma <- t(solve(t(F_) %*% F_, t(F_) %*% X_trn))

E <- xts(t(t(X_trn) - Gamma %*% t(F_)), index(X_trn))

#Fama French three factor model

Sigma_FamaFrench <- B %*% cov(F_FamaFrench_trn) %*% t(B) + diag(diag(Psi))

#Statistical single factor model

while (norm(Sigma - Sigma_prev, "F")/norm(Sigma, "F") > 1e-3) {
B <- eigSigma\$vectors[, 1:K, drop = FALSE] %*% diag(sqrt(eigSigma\$values[1:K]), K, K)

#Statistical three factor model
K <- 3

while (norm(Sigma - Sigma_prev, "F")/norm(Sigma, "F") > 1e-3) {
B <- eigSigma\$vectors[, 1:K] %*% diag(sqrt(eigSigma\$values[1:K]), K, K)
Psi <- diag(diag(Sigma - B %*% t(B)))

Sigma_PCA3 <- Sigma

#Statistical five factor model
K <- 5

eigSigma <- eigen(Sigma)
while (norm(Sigma - Sigma_prev, "F")/norm(Sigma, "F") > 1e-3) {
B <- eigSigma\$vectors[, 1:K] %*% diag(sqrt(eigSigma\$values[1:K]), K, K)
Psi <- diag(diag(Sigma - B %*% t(B)))``````

Finally, let’s compare the different estimates in the test data:

``````Sigma_true <- cov(X_tst)

Barplot (error, main = "covariance matrix estimation error"),`````` ``````PRIAL <- 100*(ref - error^2)/ref

Barplot (primary, main = a priori method for covariance matrix estimation),`````` Finally, we can see that it is helpful to use factor model to estimate covariance matrix. Most popular insights

## Third party calls wechat payment interface

Step one: preparation 1. Wechat payment interface can only be called if the developer qualification has been authenticated on wechat open platform, so the first thing is to authenticate. It’s very simple, but wechat will charge 300 yuan for audit 2. Set payment directory Login wechat payment merchant platform（ pay.weixin.qq . com) — > Product […]