Numerical analysis: power iteration and PageRank algorithm

Time:2021-10-17

1. Power iterative algorithm (power method for short)

(1) Dominant eigenvalue and dominant eigenvector

Known matrix\(\bm{A} \in \R^{n \times n}\), \(\bm{A}\)The dominant eigenvalue of is\(\bm{A}\)Other eigenvalues (absolute values) of are large eigenvalues\(\lambda\), if such eigenvalues exist, and\(\lambda\)The relevant eigenvectors are called dominant eigenvectors.

(2) Properties of dominant eigenvalues and dominant eigenvectors

If a vector is repeatedly multiplied by the same matrix, the vector will be pushed in the direction of the dominant eigenvector of the matrix. As shown in the following example:

import numpy as np
def prime_eigen(A, x, k):
    x_t = x.copy()
    for j in range(k):
        x_t = A.dot(x_t)
    return x_t   
if __name__ == '__main__':
    A = np.array(
        [
            [1, 3],
            [2, 2]
        ]
    )
    x = np.array([-5, 5])
    k = 4
    r = prime_eigen(A, x, k)
    print(r)

The running results of the algorithm are as follows:

250, 260

Why does this happen? Because yes\(\forall \bm{x} \in \R^{n}\)Can be expressed as\(A\)Linear combination of all eigenvectors (assuming that all eigenvectors are tensioned)\(\R^n\)Space). We set\(\bm{x}^{(0)} = (-5, 5)^T\), the process of power iteration can be expressed as follows:

\[\begin{aligned}
& \bm{x}^{(1)} = \bm{A}\bm{x}^{(0)} = 4(1,1)^T – 2(-3, 2)^T\\
& \bm{x}^{(2)} = \bm{A}^2\bm{x}^{(0)} = 4^2(1, 1)^T + 2(-3, 2)^T\\
& …\\
& \bm{x}^{(4)} = \bm{A}^4\bm{x}^{(0)} = 4^4(1, 1)^T + 2(-3, 2)^T = 256(1, 1)^T + 2(-3, 2)^T\\
\end{aligned} \tag{1}
\]

It can be seen that the eigenvector corresponding to the dominant eigenvalue will be dominant after multiple calculations. Here 4 is the largest eigenvalue, and the calculation is closer and closer to the dominant eigenvector\((1, 1)^T\)The direction of the.
However, this repeated multiplication with the matrix is easy to lead to numerical overflow. We must normalize the vector in each step. In this way, normalized sum and matrix\(\bm{A}\)The continuous multiplication of constitutes a power iterative algorithm as follows:

import numpy as np
def powerit(A, x, k):
    for j in range(k):
        #Normalize the X of the previous round before each iteration
        u = x/np.linalg.norm(x)
        #Calculate the UN normalized X in this round of iteration
        x = A.dot(u)
        #Calculate the eigenvalue corresponding to this round
        lam = u.dot(x)
    #The eigenvector x obtained in the last iteration needs to be normalized to u
    u = x / np.linalg.norm(x)
    return u, lam        

if __name__ == '__main__':
    A = np.array(
        [
            [1, 3],
            [2, 2]
        ]
    )
    x = np.array([-5, 5])
    k = 10
    #Returns the dominant eigenvalue and the corresponding eigenvector
    u, lam = powerit(A, x, k)
    Print ("dominant eigenvector: \ n", U)
    Print ("dominant eigenvalue: \ n", Lam)

The running results of the algorithm are as follows:

Dominant eigenvectors:
 [0.70710341 0.70711015]
Dominant eigenvalues:
 3.9999809247674625

Look at the code above. We’re in the second page\(t\)The first row of the iteration is the normalized second row\(t-1\)Eigenvector approximation of round iteration\(\bm{u}^{(t-1)}\)(think about, why), and then follow\(\bm{x}^{(t)} = \bm{A}\bm{u}^{(t-1)}\)Calculate the second\(t\)Round iteration of the non normalized eigenvector approximation\(\bm{x}^{(t)}\), you need to calculate\(t\)The eigenvalue corresponding to the round iteration. Here we apply the conclusion directly\(\lambda^{(t)} = (\bm{u}^{(t-1)})^T \bm{x^{(t)}}\)。 The conclusion is derived as follows:

prove


For section\(t\)Round iteration, where the approximation of eigenvalues is not\(\bm{\lambda}^{(t)}\), we want to solve the characteristic equation

\[\bm{x^{(t-1)}} \cdot \lambda^{(t)} = \bm{A}\bm{x}^{(t-1)}
\tag{2}
\]

To get the second\(t\)The eigenvalue corresponding to the eigenvector during round iteration\(\lambda^{(t)}\)。 We use the least square method to solve the equation\((2)\)Approximate solution of. We can write the normal equation of the least squares problem as

\[(\bm{x}^{(t-1)})^T\bm{x}^{(t-1)} \cdot \lambda^{(t-1)} = (\bm{x}^{(t-1)})^T (\bm{A}\bm{x}^{(t-1)})
\tag{3}
\]

However, we can write the solution of the least squares problem as

\[\lambda^{(t)} = \frac{(\bm{x}^{(t-1)})^T\bm{A}\bm{x}^{(t-1)}}{(\bm{x}^{(t-1)})^T\bm{x}^{(t-1)}}
\tag{4}
\]

Formula\((4)\)namelyRayleigh quotient。 Given the approximation of eigenvector, the optimal approximation of Rayleigh quotient eigenvalue. By the definition of normalization

\[\bm{u}^{(t-1)} = \frac{\bm{x}^{(t-1)}}{||\bm{x}^{(t-1)}||} = \frac{\bm{x}^{(t-1)}}{{[(\bm{x}^{(t-1)})^T\bm{x}^{(t-1)}]}^{\frac{1}{2}}}
\tag{5}
\]

Then we can\((4)\)Writing:

\[\lambda^{(t)} = (\bm{u}^{(t-1)})^T\bm{A}\bm{u}^{(t-1)}
\tag{6}
\]

And because it has been calculated earlier\(\bm{x}^{(t)} = \bm{A} \bm{u}^{(t-1)}\), to avoid double counting, we will\(\bm{x}^{(t)}\)Substitution\((5)\)Get:

\[\lambda^{(t)} = (\bm{u}^{(t-1)})^T\bm{x}^{(t)}
\tag{7}
\]

Finish the certificate.


It can be seen that power iteration essentially performs normalized fixed point iteration at each step.

2. Inverse power iteration

Above, our power iteration algorithm is used to solve the maximum eigenvalue (absolute value). So how to find the minimum eigenvalue? We only need to use the power iteration for the inverse of the matrix.

We have a conclusion, matrix\(\bm{A}^{-1}\)The largest eigenvalue of is the matrix\(\bm{A}\)The reciprocal of the minimum eigenvalue of. In fact, for the matrix\(\bm{A} \in \R^{n \times n}\), so that its eigenvalue is expressed as\(\lambda_1, \lambda_2, …, \lambda_n\), if its inverse matrix exists, the inverse matrix\(A\)The eigenvalue of is\(\lambda_1^{-1}, \lambda_2^{-1}, …, \lambda_n^{-1}\), eigenvectors and matrices\(A\)Same. The theorem is proved as follows:

prove


There are eigenvalues and eigenvectors defined

\[\bm{A}\bm{v} = \lambda \bm{v}
\tag{8}
\]

This implies

\[\bm{v} = \lambda \bm{A}^{-1}\bm{v}
\tag{9}
\]

thus

\[\bm{A}^{-1}\bm{v} = \lambda^{-1}\bm{v}
\tag{10}
\]

Get a certificate.


Inverse matrix\(\bm{A}^{-1}\)Power iteration is used, and the resulting\(\bm{A}^{-1}\)The inverse of the eigenvalue of the matrix can be obtained\(\bm{A}\)Minimum eigenvalue of. The power iteration formula is as follows:

\[\bm{x}^{(t+1)} = \bm{A}^{-1}\bm{x}^{(t)}
\tag{11}
\]

But this requires us to understand the matrix\(\bm{A}\)Inverse, when matrix\(\bm{A}\)The computational complexity is too high when it is too large. So we need to modify the formula slightly\((11)\)The calculation of is equivalent to

\[\bm{A}\bm{x}^{(t+1)} = \bm{x}^{(t)}
\tag{12}
\]

In this way, we can use Gaussian elimination pairs\(\bm{x}^{(t+1)}\)Solve,
However, we now use this algorithm to find the maximum and minimum eigenvalues of the matrix. How to find other eigenvalues?
If we want to find the matrix\(\bm{A}\)In real numbers\(s\)Near the eigenvalue, you can move the matrix close to the eigenvalue. We have a theorem: for Matrices\(\bm{A} \in \R^{n \times n}\), let its eigenvalue be\(\lambda_1, \lambda_2, …, \lambda_n\), then its transfer matrix\(\bm{A}-sI\)The eigenvalue of is\(\lambda_1 -s, \lambda_2 -s, …, \lambda_n -s\)Eigenvectors and matrices\(A\)Same. The theorem is proved as follows:

prove


There are eigenvalues and eigenvectors defined

\[\bm{A}\bm{v} = \lambda \bm{v}
\tag{13}
\]

Subtract from both sides\(sI\bm{v}\), get

\[(\bm{A} – sI)\bm{v} = (\lambda – s)\bm{v}
\tag{14}
\]

Thus matrix\(\bm{A} – sI\)The eigenvalue of is\(\lambda – s\), the eigenvector is still\(\bm{v}\), get the certificate.


So, we want to find the matrix\(\bm{A}\)In real numbers\(s\)For the nearby eigenvalues, the matrix can be adjusted first\((\bm{A}-sI)^{-1}\)Using power iteration\((\bm{A}-sI)^{-1}\)Maximum eigenvalue of\(b\)(because we know that the eigenvalue after transfer is\((\lambda – s)^{-1}\), to make\(\lambda\)As close as possible\(s\), we have to take the maximum eigenvalue), in which the\(x^{(t)}\)Yes\((\bm{A}-sI)\bm{x}^{(t)}=\bm{u}^{(t-1)}\)Gaussian elimination is carried out to obtain. Finally, we calculate\(\lambda = b^{-1} + s\)This is the matrix\(A\)stay\(s\)Nearby eigenvalues. The implementation of the algorithm is as follows:

import numpy as np

def powerit(A, x, s, k):
    As = A-s*np.eye(A.shape[0])
    for j in range(k):
        #To keep the data from getting out of control
        #Normalize x before each iteration
        u = x/np.linalg.norm(x)
        
        #Solve (a-Si) XJ = uj-1
        x = np.linalg.solve(As, u)
        lam = u.dot(x)
    lam = 1/lam + s
        
    #The eigenvector x obtained in the last iteration needs to be normalized to u
    u = x / np.linalg.norm(x)
    return u, lam        

if __name__ == '__main__':
    A = np.array(
        [
            [1, 3],
            [2, 2]
        ]
    )
    x = np.array([-5, 5])
    k = 10
    #The translation value of inverse power iteration can converge to different eigenvalues through the translation value
    s = 2 
    #Returns the dominant eigenvalue and the corresponding eigenvalue
    u, lam = powerit(A, x, s, k)
    #U is [0.70710341, 0.70711015], pointing to the direction of eigenvector [1, 1]
    Print ("dominant eigenvector: \ n", U)
    Print ("dominant eigenvalue: \ n", Lam)

The running results of the algorithm are as follows:

Dominant eigenvectors:
 [0.64221793 0.7665221 ]
Dominant eigenvalues:
 4.145795530352381

3. Application of power iteration: PageRank algorithm

A major application of power iteration is PageRank algorithm. PageRank algorithm acts as an iterative algorithm on a directed graph. After convergence, each node can be assigned a value indicating the degree of importance. The larger the value, the more important the node appears in the graph.
For example, given the following directed graph:
电影爱好者的评分情况示意图
Its adjacency matrix is:

\[\left(
\begin{matrix}
0 & 1 & 1 \\
0 & 0 & 1 \\
1 & 0 & 0 \\
\end{matrix}
\right)
\tag{15}
\]

We normalize the adjacency matrix along the row to obtain the Markov probability transfer matrix\(\bm{M}\)

\[\left(
\begin{matrix}
0 & \frac{1}{2} & \frac{1}{2} \\
0 & 0 & 1 \\
1 & 0 & 0 \\
\end{matrix}
\tag{16}
\right)
\]

We define the probability of a web surfer moving from one page to another\(q\), the probability of staying on this page is\(1-q\)。 Let the number of nodes of the graph be\(n\)Then we can calculate the Google matrix as the general transfer matrix of a directed graph. For each element of the matrix, we have:

\[\bm{G}_{ij} = \frac{q}{n} + (1-q)\bm{M}_{ij}
\tag{17}
\]

Note that the sum of each column of Google matrix is 1, which is a random matrix, which satisfies a property that the dominant eigenvalue is 1
We use matrix representation, namely:

\[\bm{G}_{ij} = \frac{q}{n}\bm{E} + (1-q)\bm{M}_{ij}
\tag{18}
\]

among\(\bm{E}\)The element is all 1\(n \times n\)matrix.
Then we define the vector\(\bm{p}\), its elements\(\bm{p}_i\)Yes, stay on the page\(i\)Probability on. We know from the previous power iteration algorithm that after the matrix and vector are repeatedly multiplied, the vector will be pushed to the direction with eigenvalue 1. Here, the eigenvector corresponding to eigenvalue 1 is the steady-state probability of a group of pages. By definition, this is\(i\)The level of a page, that is, the origin of rank in the name of PageRank algorithm. (also, this is\(G^T\)The steady-state solution of the defined Markov process). Therefore, we define the iterative process:

\[\bm{p}_{t+1} = \bm{G}^T\bm{p}_{t}
\tag{19}
\]

Note that after each iteration, we have to\(\bm{p}\)Vector normalization (to reduce time complexity, we divide by\(p\)The maximum value of all dimensional elements of the vector can be normalized by approximate two norm); Moreover, we also need to do the same after all rounds of iterations\(p\)The vector is normalized (divided by the sum of all dimension elements to ensure that the sum of all dimensions is 1).
We implement the PageRank algorithm code of the graph as follows (including the probability of moving to a random page)\(q\)0.15 as usual):

import numpy as np
#Normalized simultaneous iteration, K is the number of iteration steps
#To push in the direction of the eigenvalue of a, a must be a square matrix
def PageRank(A, p, k, q):
    assert(A.shape[0]==A.shape[1])
    n = A.shape[0]
    M = a.copy(). Astype (NP. Float32) # note to convert to floating point
    for i in range(n):
        M[i, :] = M[i, :]/np.sum(M[i, :])
    G = (q/n)*np.ones((n,n)) + (1-q)*M
    G_T = G.T
    p_t = p.copy()
    for i in range(k):
        y = G_T.dot(p_t)
        p_t = y/np.max(y)
    return p_t/np.sum(p_t)
if __name__ == '__main__':
    A = np.array(
        [
            [0, 1, 1],
            [0, 0, 1],
            [1, 0, 0]
        ]
    )
    k = 20
    p = np.array([1, 1, 1]) 
    Q = 0.15 # the probability of moving to a random page is usually 0.15
    #The probability is 1-Q to move to the page linked to this page
    R= PageRank(A, p, k, q)
    print(R)

The running results of the algorithm are as follows:

[0.38779177 0.21480614 0.39740209]

You can see the rank vector of the web page after the 20 step iteration\(\bm{R}=(0.38779177, 0.21480614, 0.39740209)^T\), this can also be seen as the importance of web pages.

Well known library and source code reading suggestions

PageRank algorithm has many excellent open source implementations. Here are several projects recommended:

(1) Spark-GraphX

Graphx is an excellent distributed graph computing library, which belongs to spark distributed computing framework. It uses Scala language to implement many distributed graph computing algorithms, including the PageRank algorithm we talked about here.
Document addresshttps://spark.apache.org/graphx
Source addresshttps://github.com/apache/spark

(2) neo4j

Neo4j is a well-known graph database implemented in Java, which also provides the implementation of PageRank algorithm.
Document addresshttps://neo4j.com/
Source addresshttps://github.com/neo4j/neo4j.git

If you are interested in studying the implementation of search engine, I recommend the elastic search project, which is a search engine based on Java implementation.
Document addresshttps://www.elastic.co/cn/
Source addresshttps://github.com/elastic/elasticsearch.git

reference

  • [1] Timothy Sauer. Numerical analysis (2nd Edition) [M]. China Machine Press, 2018
  • [2] Li Hang. Statistical learning methods (2nd Edition) [M]. Tsinghua University Press, 2019