### review of CG method:I

Most popular method in solving linear system, conjugate gradient is simply a few connected elegant ideas whose geometric meanning is almost intuitive. In solving system

$Ax=b$

where $A$ is SPD matrix, it is obvious that the solution $x$ of the equation is equal to the minimizer of quadratic form

$f(x)=\frac{1}{2}x^TAx-x^Tb+c$

If the positive definite condition is removed, then solution of $Ax=b$ is simply one stationary point of $f$.

Pictures below show 3D plot of a quadratic form and its contour with minus gradient $-\nabla f(x)=b-Ax$. In order to solve the linear system, one need find the minimizer, and in turn steepest descent method is the way to follow the direction of arrow in the picture and move toward the goal, in another word,  like a ball sliding on the 3D paraboloid toward the lowest point. For iteration algorithm, we obtain point $x_{i+1}$ in the domian by $x_{i+1}=x_i+l_ir_i$, where $r_i$ is the descent direction at point $x_i$

$r_i=-\nabla f(x_i)=b-Ax_i$

and $l_i$ is the distance to move toward that direction, which is determined by maximizing the “profit” in present step, i.e. to move to the lowest value of $x$ in direction of $r_i$. This can be achieved, intuitively, move to the “most inner” contour in that direction. Now one can see by “most inner” I mean when the contour is tangential to vector line of $r_i$. One corollary of this method is that in next step we are going to move perpendicular to the direction of last move: $r_{i+1}\perp r_i$

Another view to see this is that by “maximizing profit” we mean to minimize the value of $f$ at present step, so we are looking for step size $l_i$ satisfying

$\frac{d f(x_{i+1})}{d l_i}=f'(x_{i+1})^Tr_i=(Ax_i-b)^Tr_i=-r_{i+1}^Tr_i=0$

By this we can obtain $l_i=r_i^Tr_i/(r_i^TAr_i)$, and algorithm of steepest descent: $\cdots\rightarrow r_i=b-Ax_i \rightarrow l_i\rightarrow x_{i+1}=x_i+l_ir_i\rightarrow\cdots$