Time：2021-6-11

# Link to the original text:http://tecdat.cn/?p=21425

Extreme value theory focuses on the tail characteristics of risk loss distribution, and is usually used to analyze rare probability events. It can rely on a small number of sample data to get the change of extreme value in the population distribution when the population distribution is unknown, and has the estimation ability beyond the sample data. Therefore, the model based on GPD (generalized Pareto distribution) distribution can make more effective use of the limited information of catastrophe loss data, thus becoming the mainstream technology of extreme value theory.

Aiming at the characteristics of low frequency, high loss, lack of data and thick tail of catastrophe, the GPD model is used to build the statistical model of fire economic loss data; The shape and scale parameters are estimated. The model test shows that GPD model has good fitting effect and fitting precision for the characteristics of catastrophe risk thick tail, which provides a theoretical basis for the modeling of catastrophe risk estimation and the pricing of catastrophe bonds.

# Fire loss data

The data used in this paper are collected from reinsurance companies, including 2167 fire losses from 1980 to 1990. Inflation has been adjusted. The total claim amount has been divided into building loss and profit loss.

``````base1=read.table( "dataunivar.txt",

Consider the first data set (so far, we are dealing with univariate extremum),

`````` > D=as.Date(as.character(base1\$Date),"%m/%d/%Y")
> plot(D,X,type="h")``````

The chart is as follows:

Then a natural idea is visualization

for example

`` > plot(log(Xs),log((n:1)/(n+1)))``

# linear regression

The point here is in a straight line. The slope can be obtained by linear regression,

`````` lm(formula = Y ~ X, data = B)
lm(Y~X,data=B[(n-500):n,])
lm(formula = Y ~ X, data = B[(n - 100):n, ]) ``````

# Heavy tailed distribution

The slope here is related to the tail index of the distribution. Consider some heavy tailed distributions

Because the natural estimator is order statistics, the slope of the line is opposite to the tail index

The estimated slope is (only the largest observation is considered)

# Hill estimate

Hill’s estimate is based on the assumption that the denominator above is almost 1.

Then we can get the convergence hypothesis. further

Based on this (asymptotic) distribution, a (asymptotic) confidence interval can be obtained

``````> xi=1/(1:n)*cumsum(logXs)-logXs
> xise=1.96/sqrt(1:n)*xi

> polygon(c(1:n,n:1),c(xi+xise,rev(xi-xise)), ``````

# Incremental method

(obtained using the incremental method). Similarly, we can use this result to obtain (asymptotic) confidence intervals

`````` > alphase=1.96/sqrt(1:n)/xi
> polygon(c(1:n,n:1),c(alpha+alphase,rev(alpha-alphase)), ``````

# Deckers einmal de Haan estimator

Then (considering the condition of convergence rate again, i.e,

# Pickends estimation

because

​,

code

``````> xi=1/log(2)*log( (Xs[seq(1,length=trunc(n/4),by=1)]-
+ Xs[seq(2,length=trunc(n/4),by=2)])/

> xise=1.96/sqrt(seq(1,length=trunc(n/4),by=1))*
+sqrt( xi^2*(2^(xi+1)+1)/((2*(2^xi-1)*log(2))^2))

> polygon(c(seq(1,length=trunc(n/4),by=1),rev(seq(1, ``````

# Fitting GPD distribution

The maximum likelihood method can also be used to fit the GPD distribution at high threshold.

`````` > gpd
\$n
[1] 2167

\$threshold
[1] 5

\$p.less.thresh
[1] 0.8827873

\$n.exceed
[1] 254

\$method
[1] "ml"

\$par.ests
xi      beta
0.6320499 3.8074817

\$par.ses
xi      beta
0.1117143 0.4637270

\$varcov
[,1]        [,2]
[1,]  0.01248007 -0.03203283
[2,] -0.03203283  0.21504269

\$information
[1] "observed"

\$converged
[1] 0

\$nllh.final
[1] 754.1115

attr(,"class")
[1] "gpd"``````

Or equivalently

``````> gpd.fit
\$threshold
[1] 5

\$nexc
[1] 254

\$conv
[1] 0

\$nllh
[1] 754.1115

\$mle
[1] 3.8078632 0.6315749

\$rate
[1] 0.1172127

\$se
[1] 0.4636270 0.1116136``````

It can visualize the profile likelihood of tail index,

``> gpd.prof``

perhaps

``> gpd.prof``

Therefore, the maximum likelihood estimator of tail index can be drawn as a function of threshold (including confidence interval),

``````Vectorize(function(u){gpd(X,u)\$par.ests[1]})

plot(u,XI,ylim=c(0,2))
segments(u,XI-1.96*XIS,u,XI+ ``````

Finally, you can use the block maximum technique.

``````gev.fit
\$conv
[1] 0

\$nllh
[1] 3392.418

\$mle
[1] 1.4833484 0.5930190 0.9168128

\$se
[1] 0.01507776 0.01866719 0.03035380``````

The estimate of the tail index is the last coefficient here.

Most popular insights

1.Empirical research on fitting and forecasting of R language based on arma-garch-var model

## The road of high salary, a more complete summary of MySQL high performance optimization in history

preface MySQL for many linux practitioners, is a very difficult problem, most of the cases are because of the database problems and processing ideas are not clear. Before the optimization of MySQL, we must understand the query process of MySQL. In fact, a lot of query optimization work is to follow some principles so that […]