review of CG method:I

by Yi Zhang

Most popular method in solving linear system, conjugate gradient is simply a few connected elegant ideas whose geometric meanning is almost intuitive. In solving system


where A is SPD matrix, it is obvious that the solution x of the equation is equal to the minimizer of quadratic form


If the positive definite condition is removed, then solution of Ax=b is simply one stationary point of f.

Pictures below show 3D plot of a quadratic form and its contour with minus gradient -\nabla f(x)=b-Ax. In order to solve the linear system, one need find the minimizer, and in turn steepest descent method is the way to follow the direction of arrow in the picture and move toward the goal, in another word,  like a ball sliding on the 3D paraboloid toward the lowest point. For iteration algorithm, we obtain point x_{i+1} in the domian by x_{i+1}=x_i+l_ir_i, where r_i is the descent direction at point x_i

r_i=-\nabla f(x_i)=b-Ax_i

and l_i is the distance to move toward that direction, which is determined by maximizing the “profit” in present step, i.e. to move to the lowest value of x in direction of r_i. This can be achieved, intuitively, move to the “most inner” contour in that direction. Now one can see by “most inner” I mean when the contour is tangential to vector line of r_i. One corollary of this method is that in next step we are going to move perpendicular to the direction of last move: r_{i+1}\perp r_i

Another view to see this is that by “maximizing profit” we mean to minimize the value of f at present step, so we are looking for step size l_i satisfying

\frac{d f(x_{i+1})}{d l_i}=f'(x_{i+1})^Tr_i=(Ax_i-b)^Tr_i=-r_{i+1}^Tr_i=0

By this we can obtain l_i=r_i^Tr_i/(r_i^TAr_i), and algorithm of steepest descent: \cdots\rightarrow r_i=b-Ax_i \rightarrow l_i\rightarrow x_{i+1}=x_i+l_ir_i\rightarrow\cdots