Exercises in Chapter 11 of statistical learning methods

Time:2021-11-1

Exercise 11.1

By the question, according to the formula\(P(Y) = \frac{1}{\sum \limits_Y \prod \limits_C \Psi_C(Y_C)} \prod \limits_C \Psi_C(Y_C)\)

The factor decomposition of probabilistic undirected graph model is the operation of expressing the joint probability distribution of probabilistic undirected graph model as the product of the function of random variables on its largest clique

The maximum mass in Figure 11.3 is\(\{Y_1, Y_2, Y_3\}\)and\(\{Y_2, Y_3, Y_4\}\)

So,\(P(Y) = \frac{\Psi_{(1,2,3)} (Y_{(1,2,3)}) * \Psi_{(2,3,4)} (Y_{(2,3,4)})}{\sum \limits_Y[\Psi_{(1,2,3)} (Y_{(1,2,3)}) * \Psi_{(2,3,4)} (Y_{(2,3,4)})]}\)

Exercise 11.2

Step 1, prove\(Z(x) = \alpha^T_n(x)*1\)

According to the matrix form of conditional random field,\((M_{n+1}(x))_{i,j} = \begin{cases} 1, j=stop \\ 0, otherwise\end{cases}\)

According to the definition of forward vector,\(\alpha_0(y_0|x) = \begin{cases} 1, y_0=start \\ 0, otherwise \end{cases}\)

So,\(Z_n(x) = (M_1(x) M_2(x) … M_{n+1} (x))_{stop, end} \\ = \alpha_0^T(x)M_1(x)M_2(x)…M_n(x)*1 \\ = \alpha_n^T(x)*1\)

The second step is to prove\(Z(x) = 1^T*\beta^T_1(x)\)

According to the definition of backward vector,\(\beta_{n+1}(y_{n+1 | x}) = \begin{cases} 1,y_{n+1} = stop \\ 0,otherwise\end{cases}\)

So,\(Z_n(x) = (M_1(x) M_2(x) … M_{n+1} (x))_{stop, end} \\ = (M_1(x)M_2(x)…M_n(x)\beta_{n+1}(x))_{start} \\ = (\beta_1(x))_{start} = 1^T * \beta_1(x)\)

in summary,\(Z(x) = \alpha^T_n(x)*1 = 1^T*\beta^T_1(x)\)

Exercise 11.3

The maximum likelihood function of conditional random field is\(L(w)=\sum \limits ^N_{j=1} \sum \limits^K_{k=1} w_k f_k(y_j,x_j)-\sum \limits ^N_{j=1} \log{Z_w(x_j)}\)

The maximization likelihood function is the minimization loss function, so\(f(w) = -L(w)\)

The gradient of the loss function is\(g(w) = \nabla f(w) =(\frac{\partial f(w)}{\partial w_i} …)\)

Among them,\(\frac{\partial f(w)}{\partial w_i} = -\sum \limits^N_{j=1} w_i f_i(y_j,x_j) + \sum \limits ^N_{j=1} \frac{1}{Z_w(x_j)} \cdot \frac{\partial{Z_w(x_j)}}{\partial{w_i}} \\ = -\sum \limits ^N_{j=1}w_if_i(y_j,x_j)+\sum \limits ^N_{j=1}\frac{1}{Z_w(x_j)}\sum_y(\exp{\sum^K_{k=1}w_kf_k(y,x_j))}w_if_i(y,x_j)\)

Then the gradient descent method can be used to solve it

Exercise 11.4

with\(start=2\)As the starting point,\(stop=2\)Status sequence of all paths to the destination\(y\)The probability of is:
Path: 2 – > 1 – > 2 – > 1 – > 2, probability: 0.21
Path: 2 – > 2 – > 1 – > 1 – > 2 probability: 0.175
Path: 2 – > 2 – > 1 – > 2 – > 2, probability: 0.175
Path: 2 – > 1 – > 2 – > 2 – > 2, probability: 0.14
Path: 2 – > 2 – > 2 – > 1 – > 2 probability: 0.09
Path: 2 – > 1 – > 1 – > 1 – > 2 probability: 0.075
Path: 2 – > 1 – > 1 – > 2 – > 2 probability: 0.075
Path: 2 – > 2 – > 2 – > 2 – > 2 – > 2, probability: 0.06

The state sequence with the highest probability is 2 – > 1 – > 2 – > 1 – > 2