Exercises in Chapter 9 of statistical learning methods

Time:2021-12-2

Exercise 9.1

EM algorithm is divided into E-step and M-step

For step e, calculate the expectation.\(\mu_j^{(i+1)} = \frac{\pi^{(i)}(p^{(i)})^{y_j}(1-p^{(i)})^{1-y_j}}{\pi^{(i)}(p^{(i)})^{y_j}(1-p^{(i)})^{1-y_j} + (1 – \pi^{(i)})(q^{(i)})^{y_j}(1-q^{(i)})^{1-y_j}}\)

For Step M, the maximum likelihood is estimated.\(\pi^{(i+1)} = \frac{1}{n} \sum \mu_j^{(i+1)}\)\(p^{(i+1)} = \frac{\sum \mu_j^{(i+1)}y_j}{\sum \mu_j^{(i+1)}}\)\(q^{(i+1)} = \frac{\sum (1-\mu_j^{(i+1)})y_j}{\sum (1 – \mu_j^{(i+1)})}\)

First iteration

Step e: the observation value is 1\(\mu^{(1)} = 0.4115\), the observed value is 0\(\mu^{(1)} = 0.5374\)

Step M:\(\pi^{(1)} = 0.4619\)\(p^{(1)} = 0.5346\)\(q^{(1)} = 0.6561\)

Second iteration

Step e: the observation value is 1\(\mu^{(2)} = 0.4116\), the observed value is 0\(\mu^{(2)} = 0.5374\)

Step M:\(\pi^{(2)} = 0.4619\)\(p^{(2)} = 0.5346\)\(q^{(2)} = 0.6561\)

The EM algorithm converges in two rounds to obtain the maximum likelihood estimation of the parameters.

\(\hat \pi = 0.4619, \hat p = 0.5346,\hat q =0.6561\)

Exercise 9.2

Lemma 9.2: if\(\tilde P_{\theta}(Z) = P(Z|Y,\theta)\), then\(F(\tilde P, \theta) = log P(Y|\theta)\)

\(F(\tilde P, \theta) = E_{\tilde P}[logP(Y,Z|\theta)] + H(\tilde P) \\ = E_{\tilde P}[logP(Y,Z|\theta)] – E_{\tilde P} log \tilde P(Z) \\ =\sum_Z \tilde P_{\theta}(Z) logP(Y,Z|\theta) -\sum_Z \tilde P(Z) log \tilde P(Z) \\ =\sum_Z P(Z|Y,\theta) logP(Y,Z|\theta) -\sum_Z P(Z|Y,\theta) log P(Z|Y,\theta) \\ =\sum_Z P(Z|Y,\theta) log \frac{P(Y,Z|\theta)}{P(Z|Y,\theta)} \\ = \sum_Z P(Z|Y,\theta) logP(Y|\theta) = logP(Y|\theta)\)

Exercise 9.3

Call the API sklearn.mixture.gaussian mixture for training

\(\alpha_1 = 0.8668, \mu_1 = 32.9849, \sigma_1^2 = 429.4576\)

\(\alpha_2 = 1- \alpha_1 = 0.1332, \mu_2 = -57.5111, \sigma_2^2 = 90.2499\)

Exercise 9.4

Mixture of naive Bayes model (nbmm)

EM algorithm of nbmm

Step e:\(w_j^{(i)} = P(z^{(i)} =1|x^{(i)};\phi_z,\phi_{j|z^{(i)}=1},\phi_{j|z^{(i)}=0})\)

Step M:\(\phi_{j|z^{(i)}=1} = \frac{\sum w^{(i)}I(x_j^{(i)}=1)}{\sum w^{(i)}}\)\(\phi_{j|z^{(i)}=0} = \frac{\sum (1-w^{(i)})I(x_j^{(i)}=1)}{\sum (1-w^{(i)})}\)\(\phi_{z^{(i)}} = \frac{\sum w^{(i)}}{m}\)