# Numerical Analysis: Power Iteration and PageRank Algorithm (Numpy Implementation)

Time：2022-9-22

## 1. Power iterative algorithm (referred to as the power method)

### (1) Dominant eigenvalues ​​and dominant eigenvectors

known square matrix$$\bm{A} \in \R^{n \times n}$$, $$\bm{A}$$The dominant eigenvalue of is the ratio$$\bm{A}$$The other eigenvalues ​​of (the absolute value of) are all large eigenvalues$$\lambda$$, if such an eigenvalue exists, then with$$\lambda$$The relevant eigenvectors are called dominant eigenvectors.

### (2) Properties of dominant eigenvalues ​​and dominant eigenvectors

If a vector is repeatedly multiplied by the same matrix, the vector is pushed in the direction of the dominant eigenvector of that matrix. As shown in the following example:

import numpy as np
def prime_eigen(A, x, k):
x_t = x.copy()
for j in range(k):
x_t = A.dot(x_t)
return x_t
if __name__ == '__main__':
A = np.array(
[
[1, 3],
[2, 2]
]
)
x = np.array([-5, 5])
k = 4
r = prime_eigen(A, x, k)
print(r)

The result of running the algorithm is as follows:

250, 260

Why does this happen? because right$$\forall \bm{x} \in \R^{n}$$can be expressed as$$A$$Linear combination of all eigenvectors (assuming that all eigenvectors span$$\R^n$$space). we set$$\bm{x}^{(0)} = (-5, 5)^T$$, the process of power iteration can be expressed as follows:

\begin{aligned} & \bm{x}^{(1)} = \bm{A}\bm{x}^{(0)} = 4(1,1)^T – 2(-3, 2)^T\\ & \bm{x}^{(2)} = \bm{A}^2\bm{x}^{(0)} = 4^2(1, 1)^T + 2(-3, 2)^T\\ & …\\ & \bm{x}^{(4)} = \bm{A}^4\bm{x}^{(0)} = 4^4(1, 1)^T + 2(-3, 2)^T = 256(1, 1)^T + 2(-3, 2)^T\\ \end{aligned} \tag{1}

It can be seen that the eigenvector corresponding to the dominant eigenvalue will dominate after multiple calculations. Here 4 is the largest eigenvalue, and the calculation is getting closer and closer to the dominant eigenvector$$(1, 1)^T$$direction.
However, this repeated multiplication and summation of the matrix can easily lead to numerical overflow, and we must normalize the vector at each step. That's it, the normalized sum and matrix$$\bm{A}$$The multiplication of , forms the power iterative algorithm shown below:

import numpy as np
def powerit(A, x, k):
for j in range(k):
# Before each iteration, normalize the x of the previous round
u = x/np.linalg.norm(x)
# Calculate the unnormalized x of this iteration
x = A.dot(u)
# Calculate the eigenvalues ​​corresponding to this round
lam = u.dot(x)
# The feature vector x obtained from the last iteration needs to be normalized to u
u = x / np.linalg.norm(x)
return u, lam

if __name__ == '__main__':
A = np.array(
[
[1, 3],
[2, 2]
]
)
x = np.array([-5, 5])
k = 10
# Return dominant eigenvalues ​​and corresponding eigenvectors
u, lam = powerit(A, x, k)
print(&quot;Predominant eigenvector:\n&quot;, u)
print(&quot;Predominant eigenvalue:\n&quot;, lam)

The result of running the algorithm is as follows:

Dominant eigenvectors:
[0.70710341 0.70711015]
Dominant eigenvalues:
3.9999809247674625

Observe the above code, we are in the$$t$$The first line of the round iteration, the normalized first line is obtained.$$t-1$$Eigenvector approximation for round iterations$$\bm{u}^{(t-1)}$$(think about why), then follow$$\bm{x}^{(t)} = \bm{A}\bm{u}^{(t-1)}$$calculate the first$$t$$round iterative unnormalized eigenvector approximation$$\bm{x}^{(t)}$$, it is necessary to calculate the$$t$$The eigenvalues ​​corresponding to the iterations. Here we directly apply the conclusion$$\lambda^{(t)} = (\bm{u}^{(t-1)})^T \bm{x^{(t)}}$$. The derivation of this conclusion is as follows:

prove

for the first$$t$$rounds of iterations where the approximation of the eigenvalues ​​is not$$\bm{\lambda}^{(t)}$$, we want to solve the characteristic equation

$\bm{x^{(t-1)}} \cdot \lambda^{(t)} = \bm{A}\bm{x}^{(t-1)} \tag{2}$

to get the first$$t$$The eigenvalue corresponding to the eigenvector during round iteration$$\lambda^{(t)}$$. We use the least squares method to find the equation$$(2)$$approximate solution. We can write the normal equation for this least squares problem as

$(\bm{x}^{(t-1)})^T\bm{x}^{(t-1)} \cdot \lambda^{(t-1)} = (\bm{x}^{(t-1)})^T (\bm{A}\bm{x}^{(t-1)}) \tag{3}$

However, we can write the solution to this least squares problem as

$\lambda^{(t)} = \frac{(\bm{x}^{(t-1)})^T\bm{A}\bm{x}^{(t-1)}}{(\bm{x}^{(t-1)})^T\bm{x}^{(t-1)}} \tag{4}$

formula$$(4)$$that isRayleigh Quotient. The best approximation of the Rayleigh quotient eigenvalues ​​given an approximation of the eigenvectors. And by the definition of normalization, we have

$\bm{u}^{(t-1)} = \frac{\bm{x}^{(t-1)}}{||\bm{x}^{(t-1)}||} = \frac{\bm{x}^{(t-1)}}{{[(\bm{x}^{(t-1)})^T\bm{x}^{(t-1)}]}^{\frac{1}{2}}} \tag{5}$

Then we can put the formula$$(4)$$writing:

$\lambda^{(t)} = (\bm{u}^{(t-1)})^T\bm{A}\bm{u}^{(t-1)} \tag{6}$

And because it has been calculated before$$\bm{x}^{(t)} = \bm{A} \bm{u}^{(t-1)}$$, in order to avoid double calculation, we will$$\bm{x}^{(t)}$$Substitute$$(5)$$get:

$\lambda^{(t)} = (\bm{u}^{(t-1)})^T\bm{x}^{(t)} \tag{7}$

Certificate completed.

It can be seen that power iteration essentially performs normalized fixed point iteration at each step.

## 2. Inverse Power Iteration

Our power iterative algorithm above is used to find the (in absolute value) largest eigenvalue. So how to find the smallest eigenvalue? We just need to use power iteration for the inverse of the matrix.

We have the conclusion that the matrix$$\bm{A}^{-1}$$The largest eigenvalue of is the matrix$$\bm{A}$$The reciprocal of the smallest eigenvalue of . In fact, for the matrix$$\bm{A} \in \R^{n \times n}$$, let its eigenvalues ​​be expressed as$$\lambda_1, \lambda_2, …, \lambda_n$$, if its inverse exists, then the inverse$$A$$eigenvalues ​​of$$\lambda_1^{-1}, \lambda_2^{-1}, …, \lambda_n^{-1}$$, eigenvectors and matrices$$A$$same. The theorem is proved as follows:

prove

There are eigenvalues ​​and eigenvectors defined with

$\bm{A}\bm{v} = \lambda \bm{v} \tag{8}$

This contains

$\bm{v} = \lambda \bm{A}^{-1}\bm{v} \tag{9}$

thus

$\bm{A}^{-1}\bm{v} = \lambda^{-1}\bm{v} \tag{10}$

certified.

pair inverse matrix$$\bm{A}^{-1}$$Use power iteration, and evaluate the resulting$$\bm{A}^{-1}$$Take the reciprocal of the eigenvalues ​​to get the matrix$$\bm{A}$$the smallest eigenvalue of . The power iteration formula is as follows:

$\bm{x}^{(t+1)} = \bm{A}^{-1}\bm{x}^{(t)} \tag{11}$

But this requires us to$$\bm{A}$$Inverse, when the matrix$$\bm{A}$$When it is too large, the computational complexity is too high. So we need to modify it slightly,$$(11)$$The calculation is equivalent to

$\bm{A}\bm{x}^{(t+1)} = \bm{x}^{(t)} \tag{12}$

In this way, we can take the Gaussian elimination pair$$\bm{x}^{(t+1)}$$to solve,
However, we now use this algorithm to find the largest and smallest eigenvalues ​​of the matrix, how to find other eigenvalues?
If we want to find the matrix$$\bm{A}$$in real numbers$$s$$Nearby eigenvalues, you can move the matrix close to the eigenvalues. We have the theorem: for matrices$$\bm{A} \in \R^{n \times n}$$, let its characteristic value be$$\lambda_1, \lambda_2, …, \lambda_n$$, then its transition matrix$$\bm{A}-sI$$eigenvalues ​​of$$\lambda_1 -s, \lambda_2 -s, …, \lambda_n -s$$, while the eigenvectors and matrices$$A$$same. The theorem is proved as follows:

prove

There are eigenvalues ​​and eigenvectors defined with

$\bm{A}\bm{v} = \lambda \bm{v} \tag{13}$

subtract from both sides$$sI\bm{v}$$,get

$(\bm{A} – sI)\bm{v} = (\lambda – s)\bm{v} \tag{14}$

Hence the matrix$$\bm{A} – sI$$eigenvalues ​​of$$\lambda – s$$, the eigenvectors are still$$\bm{v}$$, proved.

Thus, we want to find the matrix$$\bm{A}$$in real numbers$$s$$Nearby eigenvalues, you can first compare the matrix$$(\bm{A}-sI)^{-1}$$Use power iteration to find$$(\bm{A}-sI)^{-1}$$The largest eigenvalue of$$b$$(because we know that the transferred eigenvalues ​​are$$(\lambda – s)^{-1}$$, to make$$\lambda$$as close as possible$$s$$, you have to take the largest eigenvalue), where the$$x^{(t)}$$yes$$(\bm{A}-sI)\bm{x}^{(t)}=\bm{u}^{(t-1)}$$Obtained by Gaussian elimination. Finally, we calculate$$\lambda = b^{-1} + s$$is the matrix$$A$$exist$$s$$nearby eigenvalues. The implementation of the algorithm is as follows:

import numpy as np

def powerit(A, x, s, k):
As = A-s*np.eye(A.shape)
for j in range(k):
# To keep the data from getting out of control
# normalize x before each iteration
u = x/np.linalg.norm(x)

# Solve (A-sI)xj = uj-1
x = np.linalg.solve(As, u)
lam = u.dot(x)
lam = 1/lam + s

# The feature vector x obtained from the last iteration needs to be normalized to u
u = x / np.linalg.norm(x)
return u, lam

if __name__ == '__main__':
A = np.array(
[
[1, 3],
[2, 2]
]
)
x = np.array([-5, 5])
k = 10
# The translation value of the inverse power iteration, which can converge to different eigenvalues ​​through the translation value
s = 2
# Return the dominant eigenvalue and the corresponding eigenvalue
u, lam = powerit(A, x, s, k)
# u is [0.70710341 0.70711015], pointing in the direction of the feature vector [1, 1]
print(&quot;Predominant eigenvector:\n&quot;, u)
print(&quot;Predominant eigenvalue:\n&quot;, lam)

The result of running the algorithm is as follows:

Dominant eigenvectors:
[0.64221793 0.7665221 ]
Dominant eigenvalues:
4.145795530352381

## 3. Application of Power Iteration: PageRank Algorithm

One application of power iteration is the PageRank algorithm. The PageRank algorithm is an iterative algorithm acting on a directed graph. After convergence, each node can be assigned a value indicating the degree of importance. The larger the value, the more important the node appears in the graph.
For example, given the following directed graph: Its adjacency matrix is:

$\left( \begin{matrix} 0 & 1 & 1 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \\ \end{matrix} \right) \tag{15}$

We normalize the adjacency matrix along the rows to get the Markov probability transition matrix$$\bm{M}$$

$\left( \begin{matrix} 0 & \frac{1}{2} & \frac{1}{2} \\ 0 & 0 & 1 \\ 1 & 0 & 0 \\ \end{matrix} \tag{16} \right)$

We define the probability that a surfer moves from one page to another random page as$$q$$, the probability of staying on this page is$$1-q$$. Let the number of nodes in the graph be$$n$$, and then we can compute the Google matrix as a general transition matrix for directed graphs. For each element of the matrix, we have:

$\bm{G}_{ij} = \frac{q}{n} + (1-q)\bm{M}_{ij} \tag{17}$

Note that the sum of each column of the Google matrix is ​​1, which is a random matrix, which satisfies the property that the dominant eigenvalue is 1.
We take the matrix representation, namely:

$\bm{G}_{ij} = \frac{q}{n}\bm{E} + (1-q)\bm{M}_{ij} \tag{18}$

in$$\bm{E}$$for elements whose elements are all 1s$$n \times n$$phalanx.
Then we define the vector$$\bm{p}$$, its element$$\bm{p}_i$$is on the page$$i$$on the probability. We know from the previous power iteration algorithm that the vector will be pushed to the direction of eigenvalue 1 after the matrix and the vector are repeatedly multiplied. And here, the eigenvector corresponding to eigenvalue 1 is the steady-state probability of a set of pages, which by definition is$$i$$The rank of a page, that is, the origin of the Rank in the name of the PageRank algorithm. (At the same time, this is also$$G^T$$the steady-state solution of the defined Markov process). So we define the iterative process:

$\bm{p}_{t+1} = \bm{G}^T\bm{p}_{t} \tag{19}$

Note that after each iteration we have to$$\bm{p}$$Vector normalization (to reduce time complexity we divide by$$p$$The maximum value among all dimension elements of the vector can be normalized by an approximate two-norm); moreover, we also need to$$p$$The vector is normalized (divided by the sum of all dimension elements to guarantee that the sum of all dimensions is 1).
Our implementation of the PageRank algorithm code for this graph is as follows (where the probability of moving to a random page$$q$$Take 0.15 by convention):

import numpy as np
# Normalize simultaneous iteration, k is the number of iteration steps
# To push in the direction of the eigenvalue of A, A must be a square matrix
def PageRank(A, p, k, q):
assert(A.shape==A.shape)
n = A.shape
M = A.copy().astype(np.float32) #Note to convert to float
for i in range(n):
M[i, :] = M[i, :]/np.sum(M[i, :])
G = (q/n)*np.ones((n,n)) + (1-q)*M
G_T = G.T
p_t = p.copy()
for i in range(k):
y = G_T.dot(p_t)
p_t = y/np.max(y)
return p_t/np.sum(p_t)
if __name__ == '__main__':
A = np.array(
[
[0, 1, 1],
[0, 0, 1],
[1, 0, 0]
]
)
k = 20
p = np.array([1, 1, 1])
q = 0.15 #The probability of moving to a random page is usually 0.15
# move to the page linked to this page with probability 1-q
R= PageRank(A, p, k, q)
print(R)

The result of running the algorithm is as follows:

[0.38779177 0.21480614 0.39740209]

You can see the Rank vector of the web page after the 20-step iteration$$\bm{R}=(0.38779177, 0.21480614, 0.39740209)^T$$, which can also be seen as the degree of importance of the web page.

## Well-known libraries and source code reading suggestions

There are many excellent open source implementations of the PageRank algorithm. Here are a few recommended projects:

### (1) Spark-GraphX

GraphX ​​is an excellent distributed graph computing library, which belongs to the Spark distributed computing framework. It uses Scala language to implement many distributed graph computing algorithms, including the PageRank algorithm we are talking about here.

### (2) neo4j

neo4j is a well-known graph database implemented in Java, which also provides an implementation of the PageRank algorithm.

If you are interested in digging into the implementation of search engines, I recommend the elastic-search project, which is a Java-based search engine.