review of CG method:IV

by Yi Zhang

I have shown that in method of conjuate direction, the searching directions are in some sense orthogonal to each other:

d_i^TAd_j=0,\quad i\neq j

Which makes more sense when the whole thing is considered under a new coordinate system, a system obtained by impose tranform of L, in which L is the upper-triangle matrix from A‘s Cholesky decomposition.

d_i^TAd_j=0\Longleftrightarrow (Ld_i)^T(Ld_j)=0\Longleftrightarrow Ld_i\perp Ld_j

Similar conclusion includes that d_i is A-orthogonal to e_{i+1}, resulting in that d_i is orthogonal to r_{i+1} the residual:

d_i^Tr_{i+1}=0\Rightarrow d_i^Tr_j=0,\quad i\neq j
This condition actually implies that one can use the residual r_i processed by  Gram-Schimdt conjugation to obtain the direction d_i, and this is the conjugate gradient (CG) method. The advantage of CG is that, instead of finding a set of directions d_i at the beginning, r_i is obtained along with the iteration, because the process of Gram-Schimdt conjugation is “triagonal”, i.e., one variable is determined by its predecessors. In particular, starting from r_0=b-Ax_0 (the first step is the same as steepest descent method), the following searching directions d_i is determined by

d_i=r_i+\sum_{k=0}^{i-1}\alpha_{ik}d_k

Coefficient \alpha_{ik} is obtained by conjugate condition:

d_i^TAd_j=0\Longrightarrow r_i^TAd_j+\sum_{k=0}^{i-1}\alpha_{ik}d_k^TAd_j=r_i^TAd_j+\alpha_{ij}d_j^TAd_j=0

Then we have \alpha_{ij}=-r_i^TAd_j/(d_j^TAd_j)

In order to simplify this expression, remember that during this process of generating d_i from r_i, for any i,

\text{span}\{d_0,d_1,\cdots,d_i\}=\text{span}\{r_0,r_1,\cdots,r_i\}

so d_i^Tr_j=0\Longrightarrow r_i^Tr_j=0,\quad i\neq j

The Gram-Schimdt process also indicates that d_i^Tr_i=r_i^Tr_i (prove it). The above two identities can be used to simplify \alpha_{ij}, and prove that

\alpha_{ij}=r_i^Tr_i/(l_{i-1}d_{i-1}^TAd_{i-1}),\quad j=i-1

\alpha_{ij}=0,\quad j<i-1

and eventually

\alpha_{i,i-1}=r_i^Tr_i/(r_{i-1}^Tr_{i-1})

Finally, a complete iteration, when d_i, x_i,r_i is known, consists of

l_i=r_i^Tr_i/d_i^TAd_i\longrightarrow x_{i+1}=x_i+l_id_i\longrightarrow r_{i+1}=r_i-l_iAd_i

\alpha_{i+1}\equiv\alpha_{i+1,i}=r_{i+1}^Tr_{i+1}/(r_i^Tr_i)\longrightarrow d_{i+1}=r_{i+1}+\alpha_{i+1}d_i

Advertisements