Original link: http://tecdat.cn/?p=22181

This paper considers classification prediction based on kernel method. Note that here, we do not use standard logistic regression, it is a parametric model.

## Nonparametric method

There are three nonparametric methods for function estimation: kernel method, local polynomial method and spline method.

The advantage of nonparametric function estimation is robust. There is no specific assumption about the model, but the function is considered smooth, which avoids the risk caused by model selection; However, the complex expression, difficult to explain and large amount of calculation are a big problem of nonparametric. Therefore, the use of non participation has risks, and the choice needs to be cautious.

The idea of nonparametric is very simple: the probability of taking the observed value of the function at the observed point is large, and the value of function f (x) is estimated by weighted average using the value near X.

## Nuclear method

When the weighted weight is the kernel of a function, this method is the kernel method. The common methods are nadaraya Watson kernel estimation and gasser Muller kernel estimation, that is, NW kernel estimation and GM kernel estimation mentioned in many textbooks. Here we still don’t talk about the choice of kernel, and all kernel estimates are processed with Gauss kernel by default.

The NW kernel estimation form is:

The GM kernel estimation form is:

Where

## data

Use heart disease data to predict myocardial infarction in emergency patients, including variables:

Cardiac index

Stroke volume index

diastolic pressure

Pulmonary artery pressure

Ventricular pressure

Pulmonary resistance

Is it alive

Now that we know what the kernel estimate is, we assume that K is the density of the n (0,1) distribution. At point x, using bandwidth h, we get the following code

```
Dnorm ((stroke volume index-x) / BW, mean = 0, SD = 1)
Weighted. Mean (survival, w)}
plot(u,v,ylim=0:1,
```

Of course, we can change the bandwidth.

```
Vectorize( mean_x(x,2))(u)
```

We observed that the smaller the bandwidth, the greater the variance and the smaller the deviation. “Greater variance” here means greater variability (because the smaller the neighborhood, the fewer points to calculate the average value and the more unstable the estimated value), and “the smaller the deviation”, the immediate expected value should be calculated at point x, so the smaller the neighborhood, the better.

## Use smooth function

Use the R function to calculate the kernel regression.

`Smooth (stroke volume index, survival, ban = 2 * exp (1)`

We can copy the previous estimates. However, the output is not a function, but a sequence of two vectors. In addition, as we can see, the bandwidth is not exactly the same as the bandwidth we used before.

```
Smooth (stroke volume index, survival, "normal", bandwidth = BK)
optim(bk,f)$par}
x=seq(1,10,by=.1)
plot(x,y)
abline(0,exp(-1),col="red")
```

The slope is 0.37, which is actually e ^ {- 1}.

## High dimensional application

Now consider our bivariate data set and consider the product of some univariate (Gaussian) kernels

```
w = dnorm((df$x1-x)/bw1, mean=0,sd=1)*
dnorm((df$x2-y)/bw2, mean=0,sd=1)
w.mean(df$y=="1",w)
contour(u,u,v,levels = .5,add=TRUE)
```

We get the following predictions

Here, different colors are probabilities.

## K-NN (k-nearest neighbor algorithm)

Another method is to consider a neighborhood, which is not defined by the distance to the point, but by the N observations we get to define the K neighborhood (that is, the k-nearest neighbor algorithm).

Next, we write our own function to implement k-NN (k-nearest neighbor algorithm):

The difficulty is that we need an effective distance.

If the units of each component are very different, it makes no sense to use Euclidean distance. So we consider Mahalanobis distance

```
mahalanobis = function(x,y,Sinv){as.numeric(x-y)%*%Sinv%*%t(x-y)}
mahalanobis(my[i,1:7],my[j,1:7])
```

Here we have a function to find k nearest neighbor observation samples. Then you can do two things to get a prediction. Our goal is to predict a class, so we can consider using a majority rule: the prediction of Yi is the same as that of most neighbor samples.

`For (I in 1: length (y)) y [i] = sort (survival [k_closest (I, K)]) [(K + 1) / 2]`

We can also calculate the proportion of black spots in our nearest neighbors. It can actually be interpreted as the probability that it is black,

`For (I in 1: length (y)) y [i] = mean`

We can see the observations on the data set, the prediction based on the majority principle, and the proportion of death samples in the seven nearest neighbors

```
k_ma(7),PROPORTION=k_mean(7))
```

Here, we get a prediction of the observation point located in X, but in fact, we can find the nearest neighbor K of any X. Back to our univariate example (get a chart), we have

```
W = rank (ABS, method = "random")
Mean (survival [which (< = 9)]}
```

It’s not very smooth, but we don’t have many points.

If we use this method on two-dimensional data sets, we will get the following results.

```
k = 6
dist = function(j) mahalanobis(c(x,y))
vect = Vectorize( dist)(1:nrow(df))
idx = which(rank(vect<=k)
contour(u,u,v,levels = .5,add=TRUE)
```

This is the idea of local reasoning, using kernel to infer the neighborhood of X, or using k-NN nearest neighbor.

Most popular insights

1. Why employees leave from the decision tree model

2. R language tree based method: decision tree, random forest

3. Using scikit learn and pandas decision tree in Python

4. Machine learning: run random forest data analysis report in SAS

5. R language uses random forest and text mining to improve airline customer satisfaction

6. Machine learning boosts fast fashion and accurate sales time series

7. Using machine learning to identify the changing stock market situation — Application of hidden Markov model

8. Python machine learning: implementation of recommendation system (collaborative filtering by matrix decomposition)

9. Using Python machine learning classification to predict bank customer churn in Python