# Python integrated learning: write and build AdaBoost classification model, visual decision boundary and sklearn package call comparison

Time：2022-1-2

Boosting refers to a series of machine learning meta algorithms, which combines the outputs of many “weak” classifiers into a powerful “set”, in which the error rate of each weak classifier may be only a little better than random guess.

The name AdaBoost represents adaptive lifting. It refers to a special lifting algorithm. In this algorithm, we are suitable for a series of “tree stumps” (decision trees with one node and two leaves) and weight their final votes according to their prediction accuracy. After each iteration, we reweighted the data set and paid more attention to the data points incorrectly classified by the previous weak learner. In this way, these data points will receive “special attention” during iteration T + 1.

## How does it compare to random forests?

characteristic

Random forest

depth

Infinite (a complete tree)

Tree stump (single node with 2 leaves)

Tree growth

independent

successively

vote

identical

weighting

A) The unified initialization sample weight is .

B) For each iteration T:

1. find`ht(x)`Minimized weak learner .
2. We set weights for weak learners based on their accuracy: 3. Increase the weight of misclassification observation: .
4. Re normalize the weights so that .

C) Take the final prediction as the weighted majority of weak learner prediction: .

## mapping

We will use the following functions to visualize our data points and optionally cover the decision boundary of the fitted AdaBoost model.

``````def plot(X: np.ndaay,
y: np.ndrry,
cf=None) -> None:
"" "draw ± samples in 2D, and you can select the decision boundary" ""

if not ax:
fig, ax = plt.sults(fgsze=(5, 5), di=100)

if saligs is not None:
sies = np.array(spl_wigts) * X.hae\[0\] * 100
else:
sze = np.oes(sape=X.shpe\[0\]) * 100

if cf:
xx, yy = np.ehrid(n.aange(x\_min, x\_max, plot_step),
p.aang(y\_min, y\_max, plot_step))

pdt(np.c_\[xx.ravel(), yy.ravel()\])

#If all predictions are positive, adjust the color map accordingly.
if list(np.niue(Z)) == \[1\]:
colors = \['r'\]
else:
colors = \['b', 'r'\]

ax.st\_im(in+0.5, \_ax-0.5)
ax.st_lm(ymin+0.5, yax-0.5)``````

## data set

We will use a similar method to generate a dataset, but use fewer data points. The key here is that we want two non linearly separable classes, because this is the ideal use case of AdaBoost.

``````def maketat(n: it = 100, rased: it = None):
"" "generate a dataset for evaluating AdaBoost classifiers" ""

nclas = int(n/2)

if ranmed:
np.ram.sed(rndoed)

X, y = me\_gainqnes(n=n, n\_fees=2, n_css=2)

plot(X, y)`````` ## Benchmark using scikit learn

Let’s establish a benchmark by importing AdaBoost classifier from scikit learn and fitting it to our data set to see what the output of our model should look like.

``````from skarn.esele import AdosClaser

plat(X, y, bech)

tnr = (prdict(X) != y).man()``````  The classifier completely fits the training data set in 10 iterations, and the data points in our data set are reasonably separated.

The following is the framework code of our AdaBoost classifier. After fitting the model, we will save all the key attributes to the class — including the sample weight of each iteration — so that we can check them later to understand the role of our algorithm in each step.

The following table shows the mapping between the variable names we will use and the mathematical symbols used earlier in the algorithm description.

variable

mathematics

`sampleweight`

wi(t)

`stump`

ht(x)

`stumpweight`

αt

`error`

εt

`predict(X)`

Ht(x)

``````class AdBst:

def \_\_init\_\_(self):
self.sump = None
self.stup_weght = None
self.erro = None
self.smle_weih = None

def \_ceck\_X_y(self, X, y):
"" "verify assumptions about input data format" ""
assrt st(y) == {-1, 1}
reurn X, y``````

## Fitting model

Think back to our algorithm to fit the model:

1. find`ht(x)`Minimized weak learner .
2. We set weights for weak learners based on their accuracy: 3. Increase the weight of misclassification observation: . attention When the assumption is consistent with the label, it will be evaluated as + 1, and when it is inconsistent with the label, it will be evaluated as – 1.
4. Re normalize the weights so that .

The following code is essentially the above one-to-one implementation, but there are several points to note:

• Since the focus here is to understand the set elements of AdaBoost, we will call decinteassfir (mxdpth = 1, mlefnes = 2) to implement the logic of selecting each HT (x).
• We set the initial unified sample weight outside the for loop, and set the weight of T + 1 in each iteration T, unless it is the last iteration. Here, we specially save a set of sample weights on the fitting model so that we can visualize the sample weights at each iteration in the future.
``````def ft(slf, X: narry, y: ndray, ites: int):
"" "fit model with training data" ""

X, y = slf.\_chck\_X_y(X, y)
n = X.shpe\[0\]

#Start numpy array
self.smle_wegts = np.zos(shpe=(itrs, n))
self.tumps = np.zeos(she=iters, dtpe=obect)

#Initialize weights evenly
sef.sampewegts\[0\] = np.one(shpe=n) / n

for t in range(iters):
#Fitting weak learner
fit(X, y, smpe\_eght=urrsmle\_igts)

#The error and tree stump weight are calculated from the prediction of weak learners
predict(X)
err = cu_seghts\[(pred != y)\].sum()# / n
weiht = np.log((1 - err) / err) / 2

#Update sample weight
newweis = (
crrawe * np.exp(-sum_wiht * y * tupd)
)

#If it is not the final iteration, the sample weight of T + 1 is updated
if t+1 < ies:
sef.smpe\_wit\[t+1\] = ne\_saml_wigt

#Save the results of the iteration
sef.sups\[t\] = tump``````

## Make predictions

We use “weighted majority voting” to make the final prediction, and calculate the symbol (±) of the linear combination of the prediction of each tree stump and its corresponding tree stump weight. ``````def pedc(self, X):
"" "use the fitted model for prediction" ""
supds = np.aray(\[stp.pect(X) for sump in slf.stps\])
return np.sgn(np.dt(self.tum_whts, sumpreds))``````

## performance

Now let’s put everything together and fit the model with the same parameters as our benchmark.

``````#Specify the function we define separately as the method of the classifier
AaBt.fit = fit

plot(X, y, clf)

err = (clf.prdc(X) != y).mean()``````  not bad We achieved exactly the same results as the sklearn benchmark. I chose this dataset to show the advantages of AdaBoost, but you can run the notebook yourself to see if it matches the output, regardless of the starting conditions.

## visualization

Since we save all intermediate variables as arrays in our fitting model, we can use the following function to visualize the evolution of our set learner in each iteration t.

• The left column shows the selected “stump” weak learner, which corresponds to HT (x).
• The right column shows the cumulative strong learners so far.  Ht(x)。
• The size of data point markers reflects their relative weight. The data points incorrectly classified in the previous iteration will be more weighted, so it will appear larger in the next iteration.
``````def truost(clf, t: int):
"" "AdaBoost fitting until (and including) a specific iteration.".  """

nwwghts = clf.suweighs\[:t\]

def plotost(X, y, clf, iters=10):
"" "draw weak learners and cumulative strong learners in each iteration.".  """

#Larger mesh
fig, axs = subplos(fisze=(8,ters*3),
nrows=iers,
ncls=2,
shaex=True,
dpi=100)

#Drawing weak learners
plotot(X, y, cf.\[i\],
saplweghs=clf.saple_wigts\[i\],
aoat=False, a=ax1)

#Drawing strong learners
truost(clf, t=i + 1)
pltot(X, y, tun_cf,
weights=smplweih\[i\], ax=ax2)

plt.t_aot()`````` ## Why do some iterations have no decision boundaries?

You may notice that our weak learner classifies all points as positive when iterating t = 2,5,7,10. This happens because given the current sample weight, the minimum error can be achieved by predicting all data points as positive values. Note that in each of these iterations above, negative samples are surrounded by positive samples with proportionally higher weights.

There is no way to draw a linear decision boundary to correctly classify any number of negative data points without misclassifying the higher cumulative weight of positive samples. However, this does not prevent our algorithm from converging. All negative points are misclassified, so the sample weight increases. This weight update enables weak learners in the next iteration to find a meaningful decision boundary.

## Why are we interested in alpha_ Which particular formula does t use?

Why do we use this particular value`αt`？ We can prove the choice Minimize the exponential loss on the training set 。

Ignoring symbolic functions, we`H`Strong learner in iteration`t`Is a weighted combination of weak learners`h(x)`。 In any given iteration`t`, we can`Ht(x) `It is recursively defined as the value at iteration`t−1`Plus the weighted weak learner of the current iteration. The loss function we apply to h is the average loss of all N data points. Alternative recursive definitions`Ht(x)`, and use the identity to split the index term . Now let’s take the derivative of the loss function with respect to`αt`And set it to zero to find the parameter value of the minimization loss function. The sum can be divided into two: case where`ht(xi)=yi`And case where`ht(xi)≠yi` Finally, we recognize that the sum of weights is equivalent to the error calculation we discussed earlier: ∑ DT (I)= ϵ t。 By permutation and then algebraic operation, we can separate α t。  