Original link:http://tecdat.cn/?p=24421
AdaBoost is?
Boosting refers to a series of machine learning meta algorithms, which combines the outputs of many “weak” classifiers into a powerful “set”, in which the error rate of each weak classifier may be only a little better than random guess.
The name AdaBoost represents adaptive lifting. It refers to a special lifting algorithm. In this algorithm, we are suitable for a series of “tree stumps” (decision trees with one node and two leaves) and weight their final votes according to their prediction accuracy. After each iteration, we reweighted the data set and paid more attention to the data points incorrectly classified by the previous weak learner. In this way, these data points will receive “special attention” during iteration T + 1.
How does it compare to random forests?
characteristic
Random forest
AdaBoost
depth
Infinite (a complete tree)
Tree stump (single node with 2 leaves)
Tree growth
independent
successively
vote
identical
weighting
AdaBoost algorithm
A) The unified initialization sample weight is.
B) For each iteration T:
- find
ht(x)
Minimized weak learner.
- We set weights for weak learners based on their accuracy:
- Increase the weight of misclassification observation:
.
- Re normalize the weights so that
.
C) Take the final prediction as the weighted majority of weak learner prediction:.
mapping
We will use the following functions to visualize our data points and optionally cover the decision boundary of the fitted AdaBoost model.
def plot(X: np.ndaay,
y: np.ndrry,
cf=None) -> None:
"" "draw ± samples in 2D, and you can select the decision boundary" ""
if not ax:
fig, ax = plt.sults(fgsze=(5, 5), di=100)
pad = 1
x\_min, x\_max = X\[:, 0\].min() - pad, X\[:, 0\].max() + pad
y\_min, y\_max = X\[:, 1\].min() - pad, X\[:, 1\].max() + pad
if saligs is not None:
sies = np.array(spl_wigts) * X.hae\[0\] * 100
else:
sze = np.oes(sape=X.shpe\[0\]) * 100
if cf:
xx, yy = np.ehrid(n.aange(x\_min, x\_max, plot_step),
p.aang(y\_min, y\_max, plot_step))
pdt(np.c_\[xx.ravel(), yy.ravel()\])
#If all predictions are positive, adjust the color map accordingly.
if list(np.niue(Z)) == \[1\]:
colors = \['r'\]
else:
colors = \['b', 'r'\]
ax.st\_im(in+0.5, \_ax-0.5)
ax.st_lm(ymin+0.5, yax-0.5)
data set
We will use a similar method to generate a dataset, but use fewer data points. The key here is that we want two non linearly separable classes, because this is the ideal use case of AdaBoost.
def maketat(n: it = 100, rased: it = None):
"" "generate a dataset for evaluating AdaBoost classifiers" ""
nclas = int(n/2)
if ranmed:
np.ram.sed(rndoed)
X, y = me\_gainqnes(n=n, n\_fees=2, n_css=2)
plot(X, y)
Benchmark using scikit learn
Let’s establish a benchmark by importing AdaBoost classifier from scikit learn and fitting it to our data set to see what the output of our model should look like.
from skarn.esele import AdosClaser
bnh = Adostlier(netrs=10, atm='SAMME').fit(X, y)
plat(X, y, bech)
tnr = (prdict(X) != y).man()
The classifier completely fits the training data set in 10 iterations, and the data points in our data set are reasonably separated.
Write your own AdaBoost classifier
The following is the framework code of our AdaBoost classifier. After fitting the model, we will save all the key attributes to the class — including the sample weight of each iteration — so that we can check them later to understand the role of our algorithm in each step.
The following table shows the mapping between the variable names we will use and the mathematical symbols used earlier in the algorithm description.
variable
mathematics
sampleweight
wi(t)
stump
ht(x)
stumpweight
αt
error
εt
predict(X)
Ht(x)
class AdBst:
"" "AdaBoost Classifier" ""
def \_\_init\_\_(self):
self.sump = None
self.stup_weght = None
self.erro = None
self.smle_weih = None
def \_ceck\_X_y(self, X, y):
"" "verify assumptions about input data format" ""
assrt st(y) == {-1, 1}
reurn X, y
Fitting model
Think back to our algorithm to fit the model:
- find
ht(x)
Minimized weak learner.
- We set weights for weak learners based on their accuracy:
- Increase the weight of misclassification observation:
. attention
When the assumption is consistent with the label, it will be evaluated as + 1, and when it is inconsistent with the label, it will be evaluated as – 1.
- Re normalize the weights so that
.
The following code is essentially the above one-to-one implementation, but there are several points to note:
- Since the focus here is to understand the set elements of AdaBoost, we will call decinteassfir (mxdpth = 1, mlefnes = 2) to implement the logic of selecting each HT (x).
- We set the initial unified sample weight outside the for loop, and set the weight of T + 1 in each iteration T, unless it is the last iteration. Here, we specially save a set of sample weights on the fitting model so that we can visualize the sample weights at each iteration in the future.
def ft(slf, X: narry, y: ndray, ites: int):
"" "fit model with training data" ""
X, y = slf.\_chck\_X_y(X, y)
n = X.shpe\[0\]
#Start numpy array
self.smle_wegts = np.zos(shpe=(itrs, n))
self.tumps = np.zeos(she=iters, dtpe=obect)
#Initialize weights evenly
sef.sampewegts\[0\] = np.one(shpe=n) / n
for t in range(iters):
#Fitting weak learner
fit(X, y, smpe\_eght=urrsmle\_igts)
#The error and tree stump weight are calculated from the prediction of weak learners
predict(X)
err = cu_seghts\[(pred != y)\].sum()# / n
weiht = np.log((1 - err) / err) / 2
#Update sample weight
newweis = (
crrawe * np.exp(-sum_wiht * y * tupd)
)
#If it is not the final iteration, the sample weight of T + 1 is updated
if t+1 < ies:
sef.smpe\_wit\[t+1\] = ne\_saml_wigt
#Save the results of the iteration
sef.sups\[t\] = tump
Make predictions
We use “weighted majority voting” to make the final prediction, and calculate the symbol (±) of the linear combination of the prediction of each tree stump and its corresponding tree stump weight.
def pedc(self, X):
"" "use the fitted model for prediction" ""
supds = np.aray(\[stp.pect(X) for sump in slf.stps\])
return np.sgn(np.dt(self.tum_whts, sumpreds))
performance
Now let’s put everything together and fit the model with the same parameters as our benchmark.
#Specify the function we define separately as the method of the classifier
AaBt.fit = fit
Adostreit = pedct
plot(X, y, clf)
err = (clf.prdc(X) != y).mean()
not bad We achieved exactly the same results as the sklearn benchmark. I chose this dataset to show the advantages of AdaBoost, but you can run the notebook yourself to see if it matches the output, regardless of the starting conditions.
visualization
Since we save all intermediate variables as arrays in our fitting model, we can use the following function to visualize the evolution of our set learner in each iteration t.
- The left column shows the selected “stump” weak learner, which corresponds to HT (x).
- The right column shows the cumulative strong learners so far. Ht(x)。
- The size of data point markers reflects their relative weight. The data points incorrectly classified in the previous iteration will be more weighted, so it will appear larger in the next iteration.
def truost(clf, t: int):
"" "AdaBoost fitting until (and including) a specific iteration.". """
nwwghts = clf.suweighs\[:t\]
def plotost(X, y, clf, iters=10):
"" "draw weak learners and cumulative strong learners in each iteration.". """
#Larger mesh
fig, axs = subplos(fisze=(8,ters*3),
nrows=iers,
ncls=2,
shaex=True,
dpi=100)
#Drawing weak learners
plotot(X, y, cf.\[i\],
saplweghs=clf.saple_wigts\[i\],
aoat=False, a=ax1)
#Drawing strong learners
truost(clf, t=i + 1)
pltot(X, y, tun_cf,
weights=smplweih\[i\], ax=ax2)
plt.t_aot()
Why do some iterations have no decision boundaries?
You may notice that our weak learner classifies all points as positive when iterating t = 2,5,7,10. This happens because given the current sample weight, the minimum error can be achieved by predicting all data points as positive values. Note that in each of these iterations above, negative samples are surrounded by positive samples with proportionally higher weights.
There is no way to draw a linear decision boundary to correctly classify any number of negative data points without misclassifying the higher cumulative weight of positive samples. However, this does not prevent our algorithm from converging. All negative points are misclassified, so the sample weight increases. This weight update enables weak learners in the next iteration to find a meaningful decision boundary.
Why are we interested in alpha_ Which particular formula does t use?
Why do we use this particular valueαt
? We can prove the choiceMinimize the exponential loss on the training set
。
Ignoring symbolic functions, weH
Strong learner in iterationt
Is a weighted combination of weak learnersh(x)
。 In any given iterationt
, we canHt(x)
It is recursively defined as the value at iterationt−1
Plus the weighted weak learner of the current iteration.
The loss function we apply to h is the average loss of all N data points. Alternative recursive definitionsHt(x)
, and use the identity to split the index term.
Now let’s take the derivative of the loss function with respect toαt
And set it to zero to find the parameter value of the minimization loss function. The sum can be divided into two: case whereht(xi)=yi
And case whereht(xi)≠yi
。
Finally, we recognize that the sum of weights is equivalent to the error calculation we discussed earlier: ∑ DT (I)= ϵ t。 By permutation and then algebraic operation, we can separate α t。
Further reading
- sklearn.ensemble.AdaBoostClassifier– official scikit learn documentation
Most popular insights
1.Why employees leave from decision tree model
2.R language tree based method: decision tree, random forest
3.Using scikit learn and pandas decision trees in Python
4.Machine learning: running random forest data analysis reports in SAS
5.R language uses random forest and text mining to improve airline customer satisfaction
6.Machine learning boosts fast fashion and accurate sales time series
9.Predicting bank customer churn using Python machine learning classification in Python