Exercises in Chapter 10 of statistical learning methods

Time:2021-11-29

Exercise 10.1

By the question,\(T=4, N=3,M=2\)

According to algorithm 10.3

The first step is to calculate the final period\(\beta\)

\(\beta_4(1) = 1, \beta_4(2) = 1, \beta_4(3) = 1\)

The second step is to calculate each intermediate period\(\beta\)

\(\beta_3(1) = a_{11}b_1(o_4)\beta_4(1) + a_{12}b_2(o_4)\beta_4(2) + a_{13}b_3(o_4)\beta_4(3) = 0.46\)

\(\beta_3(2) = a_{21}b_1(o_4)\beta_4(1) + a_{22}b_2(o_4)\beta_4(2) + a_{23}b_3(o_4)\beta_4(3) = 0.51\)

\(\beta_3(3) = a_{31}b_1(o_4)\beta_4(1) + a_{32}b_2(o_4)\beta_4(2) + a_{33}b_3(o_4)\beta_4(3) = 0.43\)

\(\beta_2(1) = a_{11}b_1(o_3)\beta_3(1) + a_{12}b_2(o_3)\beta_3(2) + a_{13}b_3(o_3)\beta_3(3) = 0.2461\)

\(\beta_2(2) = a_{21}b_1(o_3)\beta_3(1) + a_{22}b_2(o_3)\beta_3(2) + a_{23}b_3(o_3)\beta_3(3) = 0.2312\)

\(\beta_2(3) = a_{31}b_1(o_3)\beta_3(1) + a_{32}b_2(o_3)\beta_3(2) + a_{33}b_3(o_3)\beta_3(3) = 0.2577\)

\(\beta_1(1) = a_{11}b_1(o_2)\beta_2(1) + a_{12}b_2(o_2)\beta_2(2) + a_{13}b_3(o_2)\beta_2(3) = 0.112462\)

\(\beta_1(2) = a_{21}b_1(o_2)\beta_2(1) + a_{22}b_2(o_2)\beta_2(2) + a_{23}b_3(o_2)\beta_2(3) = 0.121737\)

\(\beta_1(3) = a_{31}b_1(o_2)\beta_2(1) + a_{32}b_2(o_2)\beta_2(2) + a_{33}b_3(o_2)\beta_2(3) = 0.104881\)

The third step is calculation\(P(O|\lambda)\)

\(P(O|\lambda) = \pi_1b_1(o_1)\beta_1(1) + \pi_2b_2(o_1)\beta_1(2) + \pi_3b_3(o_1)\beta_1(3) = 0.0601088\)

Exercise 10.2

By definition,\(P(i_4 = q_3|O,\lambda) = \gamma_4(3)\)

According to the formula\(\gamma_4(3) = \frac{\alpha_4(3) \beta_4(3)}{P(O|\lambda)} = \frac{\alpha_4(3) \beta_4(3)}{\sum \alpha_4(j) \beta_4(j)}\)

Through program calculation, we can get\(P(i_4 = q_3|O,\lambda) = \gamma_4(3) = 0.536952\)

Exercise 10.3

According to algorithm 10.5

The first step is initialization

\(\delta_1(1) = \pi_1 b_1(o_1) = 0.2*0.5=0.1\)\(\psi_1(1) = 0\)

\(\delta_1(2) = \pi_2 b_2(o_1) = 0.4*0.4=0.16\)\(\psi_1(2) = 0\)

\(\delta_1(3) = \pi_3 b_3(o_1) = 0.4*0.7=0.28\)\(\psi_1(3) = 0\)

The second step is recursion

\(\delta_2(1) = \mathop{max} \limits_j [\delta_1(j)a_{j1}] b_1(o_2) = max\{0.1*0.5, 0.16*0.3, 0.28*0.2\}*0.5=0.028\)\(\psi_2(1) = 3\)

\(\delta_2(2) = \mathop{max} \limits_j [\delta_1(j)a_{j2}] b_2(o_2) = max\{0.1*0.2, 0.16*0.5, 0.28*0.3\}*0.6=0.0504\)\(\psi_2(2) = 3\)

\(\delta_2(3) = \mathop{max} \limits_j [\delta_1(j)a_{j3}] b_3(o_2) = max\{0.1*0.3, 0.16*0.2, 0.28*0.5\}*0.3=0.042\)\(\psi_2(3) = 3\)

\(\delta_3(1) = \mathop{max} \limits_j [\delta_2(j)a_{j1}] b_1(o_3) = max\{0.028*0.5, 0.0504*0.3, 0.042*0.2\}*0.5=0.00756\)\(\psi_3(1) = 2\)

\(\delta_3(2) = \mathop{max} \limits_j [\delta_2(j)a_{j2}] b_2(o_3) = max\{0.028*0.2, 0.0504*0.5, 0.042*0.3\}*0.4=0.01008\)\(\psi_3(2) = 2\)

\(\delta_3(3) = \mathop{max} \limits_j [\delta_2(j)a_{j3}] b_3(o_3) = max\{0.028*0.3, 0.0504*0.2, 0.042*0.5\}*0.7=0.0147\)\(\psi_3(3) = 3\)

\(\delta_4(1) = \mathop{max} \limits_j [\delta_3(j)a_{j1}] b_1(o_4) = max\{0.00756*0.5, 0.01008*0.3, 0.0147*0.2\}*0.5=0.00189\)\(\psi_4(1) = 1\)

\(\delta_4(2) = \mathop{max} \limits_j [\delta_3(j)a_{j2}] b_2(o_4) = max\{0.00756*0.2, 0.01008*0.5, 0.0147*0.3\}*0.6=0.003024\)\(\psi_4(2) = 2\)

\(\delta_4(3) = \mathop{max} \limits_j [\delta_3(j)a_{j3}] b_3(o_4) = max\{0.00756*0.3, 0.01008*0.2, 0.0147*0.5\}*0.3=0.002205\)\(\psi_4(3) = 3\)

The third step is termination

\(P^* = \mathop{max} \limits_i \delta_4(i) = 0,003024\)

\(i_4^* = \mathop{\arg\max} \limits_i [\delta_4(i)] = 2\)

The fourth step is optimal path backtracking

\(i_3^* = \psi_4(i_4^*) = 2\)

\(i_2^* = \psi_3(i_3^*) = 2\)

\(i_1^* = \psi_2(i_2^*) = 3\)

Therefore, the optimal path\(I^* = (i_1^*,i_2^*,i_3^*,i_4^*)=(3,2,2,2)\)

Exercise 10.4

Prove with forward probability and backward probability:\(P(O|\lambda) = \sum \limits_{i=1}^N \sum \limits_{j=1}^N \alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j)\)

\(\begin{aligned} P(O|\lambda) &= P(o_1,o_2,…,o_T|\lambda) \\ &= \sum_{i=1}^N P(o_1,..,o_t,i_t=q_i|\lambda) P(o_{t+1},..,o_T|i_t=q_i,\lambda) \\ &= \sum_{i=1}^N \sum_{j=1}^N P(o_1,..,o_t,i_t=q_i|\lambda) P(o_{t+1},i_{t+1}=q_j|i_t=q_i,\lambda)P(o_{t+2},..,o_T|i_{t+1}=q_j,\lambda) \\ &= \sum_{i=1}^N \sum_{j=1}^N [P(o_1,..,o_t,i_t=q_i|\lambda) P(o_{t+1}|i_{t+1}=q_j,\lambda) P(i_{t+1}=q_j|i_t=q_i,\lambda) \\ & \quad \quad \quad \quad P(o_{t+2},..,o_T|i_{t+1}=q_j,\lambda)] \\ &= \sum_{i=1}^N \sum_{j=1}^N \alpha_t(i) a_{ij} b_j(o_{t+1}) \beta_{t+1}(j),{\quad}t=1,2,…,T-1 \end{aligned}\)

Exercise 10.5

Viterbi algorithm:

initialization:\(\delta_1(i) = \pi_1b_i(o_1)\)

Recurrence:\(\delta_{t+1}(i) = \mathop{max} \limits_j [\delta_ta_{ji}]b_i(o_{t+1})\)

Forward algorithm:

Initial value:\(\alpha_1(i) = \pi_ib_i(o_1)\)

Recurrence:\(\alpha_{t+1}(i) = [\sum \limits_j \alpha_t(j)a_{ji}]b_i(o_{t+1})\)

Viterbi algorithm needs to select the maximum value based on the calculation results of the previous period

The forward algorithm directly calculates the results of the previous period