next up previous contents index
Next: subspace iteration and the Up: Convergence properties of the Previous: Lanczos-Ritz values in a   Contents   Index


subspace iteration inside of the orthogonal similarity reduction of a matrix into semiseparable form

In Section 5.2.1 we showed that the intermediate semiseparable matrices have as eigenvalues the Lanczos-Ritz values. This behavior is completely the same as the one observed in an orthogonal similarity transformation to reduce a symmetric matrix into a similar tridiagonal one. One might therefore doubt the usefulness of this reduction, because it costs $ 9 n^2 + O(n)$ more, and has not yet any advantage over the tridiagonal approach. In this section we will prove that the extra cost of $ 9 n^2 + O(n)$ provides an extra interesting property for this reduction.

At each step of the algorithm introduced in Theorem 49, one more row (column) is added by means of orthogonal transformations to the set of the rows (columns) of the matrix already proportional to each other. In this section, using arguments similar to those considered in [183,184,185,187,195], we show that this algorithm can be interpreted as a kind of nested subspace iteration method [91], where the size of the vector subspace is increased by one and a change of coordinate system is made at each step of the algorithm. As a consequence, depending on the gaps between the eigenvalues, the semiseparable part of the matrix will converge to a block diagonal matrix, and the eigenvalues of these blocks converge to the largest eigenvalues in absolute value of the original symmetric matrix.

Given a matrix $ A$ and an initial subspace $ \mathsf{S}^{(0)},$ the subspace iteration method [91] can be described as follows

$\displaystyle \mathsf{S}^{(k)}=A \mathsf{S}^{(k-1)},\;\; k=1,2,3,\ldots$

Under weak assumptions on $ A$ and $ \mathsf{S}^{(0)},$ the $ \mathsf{S}^{(k)}$ converge to an invariant subspace (more details on these assumptions will be investigated in the next subsection, because the subspace iteration will interact with the Lanczos convergence behavior). We will see that the reduction algorithm from a symmetric to a semiseparable matrix can be interpreted as such a kind of subspace iteration, where the dimension of the subspace grows by one at each step of the algorithm. Let $ A^{(1)}=Q_0^TAQ_0$. For the reduction algorithm as presented in Section 4.1 in Chapter 4 we have $ Q_0=I$. Suppose we have only performed the first orthogonal similarity transformations such that the rows (columns) $ n$ and $ n-1$ are already proportional (Let us capture all the necessary Givens and Householder transformations to go from matrix $ A^{(k)}$ to matrix $ A^{(k+1)}$ in one orthogonal matrix $ Q_k$):

$\displaystyle A^{(2)}=Q_1^T A^{(1)} Q_1,$ (5.3)

where $ A^{(2)}$ has the semiseparable structure in the rows (columns) $ n$ and $ n-1$ and $ Q_1 = [q^{(1)}_1,\ldots,q^{(1)}_n]. $ From (5.3), we can write
$\displaystyle A^{(1)}$ $\displaystyle =$ $\displaystyle Q_1\left( A^{(2)}Q_1^T \right)$ (5.4)
  $\displaystyle =$ $\displaystyle Q_1
\left( \begin{array}{cccc}
\times & \cdots & \times & 0 \\
\...
...& \cdots & \times & 0 \\
\times & \cdots & \times & \times
\end{array} \right)$  
  $\displaystyle =$ $\displaystyle Q_1L_1.$  

Let $ e_1,\ldots, e_n$ be the standard basis vectors of $ \mathbb{R}^n$. From (5.4), because of the structure of $ L_1$, it can clearly be seen that:

$\displaystyle A^{(1)}\langle e_n\rangle = \langle q^{(1)}_n\rangle .$

This means that the last column of $ A^{(1)}$ and $ q^{(1)}_n $ span the same one-dimensional space. In fact one subspace iteration step is performed on the vector $ e_n$. The first step of the algorithm is completed when the following orthogonal transformation is performed,

$\displaystyle A^{(2)}= Q_1^T A^{(1)} Q_1.$

The latter transformation can be interpreted as a change of coordinate system: $ A^{(1)}$ and $ A^{(2)}$ represent the same linear transformation with respect to different coordinate systems. Let $ y \in \mathbb{R}^n$. Then $ y$ is represented in the new system by $ Q_1^T y$. This means that for the vector $ q^{(1)}_n $ we get $ Q_1^Tq^{(1)}_n= e_n $. Summarizing, this means that one step of subspace iteration on the subspace $ \langle e_n\rangle $ is performed, resulting in a new subspace $ \langle q^{(1)}_n\rangle $, and then, by means of a coordinate transformation, it is transformed back into the subspace $ \langle e_n\rangle $. So, instead of working with a fixed matrix and changing subspaces, we work with fixed subspaces and changing matrices. Therefore, denoting by $ { z}^{(k)} $ the eigenvector corresponding to the largest eigenvalue in absolute value $ \lambda$ of $ A^{(k)},\; k=1,2,\ldots$, we can say that, if $ { z}^{(k)} $ has a nonzero last component and if the largest eigenvalue is unique, the sequence $ \left\{ { z}^{(k)} \right\}$ converges to $ e_n$, and, consequently, the lower right element of $ A^{(k)}$ converges to $ \lambda$. Note that the last assumption of the nonzero component, will play an important role in the next section, where the interaction between the Lanczos behavior and the subspace iteration is investigated.

The second step can be interpreted in a completely analogous way. Suppose we have already the semiseparable structure in the last two rows (columns). Then we perform the following similarity transformation on $ A^{(2)}$,

$\displaystyle A^{(3)}=Q_2^TA^{(2)} Q_2,$ (5.5)

in order to make the rows (columns) $ n$ up to $ n-2$ dependent. Using (5.5), $ A^{(2)}$ can be written as follows,
$\displaystyle A^{(2)}$ $\displaystyle =$ $\displaystyle Q_2
\left( \begin{array}{ccccc}
\times & \cdots & \times & 0 & 0 ...
...s & \times & 0 \\
\times & \cdots & \times &\times &\times
\end{array} \right)$  
  $\displaystyle =$ $\displaystyle Q_2 L_2.$  

Considering the subspace $ \langle e_{n-1},e_n\rangle $ and using the same notation as above, we have,

$\displaystyle A^{(2)}\langle e_{n-1},e_n\rangle = \langle q^{(2)}_{n-1},q^{(2)}_{n}\rangle .$

This means that the second step of the algorithm is a step of subspace iteration performed on a larger subspace. For every new dependency that is created in the symmetric matrix $ A^{(k)}$, the dimension of the subspace is increased by one.

One can also see the algorithm as performing in each iteration step a $ QL$ step without shift on the semiseparable bottom-right submatrix. Let us partition the matrix $ A_0^{(k)} $ (as in the proof of Theorem 49) as follows:

% latex2html id marker 34681
$\displaystyle A_0^{(k)}=A^{(k)} = \left( \begin{ar...
...ment_mark>1051 {\rule[0.1cm]{0cm}{0.3cm} v_k u_k^T } & S_k \end{array} \right),$    

with $ A_k \in \mathbb{R}^{(n-k)\times(n-k)}$, $ u_k \in \mathbb{R}^{(n-k) \times 1 }$, $ v_k \in \mathbb{R}^{k \times 1 }$and $ S_k \in \mathbb{R}^{k \times k}$. From the semiseparable structure, we know that $ [v_k,S_k e_1]$ has rank one. In the first part of the $ k$th iteration step, an $ (n-k) \times (n-k)$ orthogonal matrix $ Q_{k,1}$ is constructed such that

$\displaystyle Q_{k,1}^T u_{k} = [0,0,\ldots,0,\eta]^T
$

with $ \eta = \pm \Vert u_{k} \Vert$. Hence, our matrix $ A^{(k)}$ is transformed into the following orthogonally similar matrix
\begin{displaymath}{\left(
\begin{array}{c\vert c}
Q_{k,1}^T & 0 \\ [0.1cm]
\hli...
...]
\hline
0 & {\rule[0.1cm]{0cm}{0.3cm} I_k}
\end{array}\right)}\end{displaymath}
  $\displaystyle =$ \begin{displaymath}\left(
\begin{array}{c\vert c}
\tilde{A}_{k} & \eta e_{n-k} v...
...]{0cm}{0.3cm}
\eta v_{k} e_{n-k}^T} & S_{k}
\end{array}\right).\end{displaymath}  

Let us partition the matrix in a slightly different way, adding one more column and row to the semiseparable part. Let

% latex2html id marker 34711
$\displaystyle \tilde{A}_{k} = \left(
\begin{array}{cc}
A_{k+1} & a \\
a^T & \alpha
\end{array}\right),
$

then we can define the semiseparable matrix $ \tilde{S}_k$ as follows

% latex2html id marker 34715
$\displaystyle \tilde{S}_{k+1} =\left( \begin{array}{cc}
\alpha & \eta v_{k}^T \\
\eta v_{k} & S_{k}
\end{array}\right).
$

In the second part of the $ k$th iteration step, we perform a $ QL$ step on the semiseparable matrix $ \tilde{S}_{k+1}$ of order $ k+1$, i.e., we construct an orthogonal matrix $ Q_{k,2}$ such that

$\displaystyle \tilde{S}_{k+1} = Q_{k,2} L_{k},
$

with $ L_k$ lower triangular. Hence, the matrix of (5.6) is transformed into the orthogonal similar matrix
\begin{displaymath}{\left(
\begin{array}{c\vert c}
I_{n-k-1} & 0 \\ [0.1cm]
\hli...
...line
0 & {\rule[0.1cm]{0cm}{0.3cm} Q_{k,2}}
\end{array}\right)}\end{displaymath}
  $\displaystyle =$ \begin{displaymath}\left(
\begin{array}{c\vert c}
A_{k+1} & u_{k+1} v_{k+1}^T \\...
...m]{0cm}{0.3cm}
v_{k+1} u_{k+1}^T} & S_{k+1}
\end{array}\right).\end{displaymath}  

The execution of the two steps is illustrated in Figure 5.1. At the beginning of the iteration the last four rows and columns have already a semiseparable structure. In (1), a similarity transformation (either an Householder matrix or a product of Givens transformations) is considered in order to annihilate the entries indicated by $ \otimes$. Due to the semiseparable structure, all the entries in the first four rows (colums) with column (row) indexes greater than 5 are annihilated, too. In steps (2), (3), (4) and (5), the Givens rotations are only applied to the left, transforming the $ (5 \times 5) $ principal submatrix in the right-bottom corner to lower triangular form. To retrieve the semiseparable structure in the last 5 rows and columns, the transpose of the latter Givens rotation must be applied to the right.

Figure 5.1: Description of $ QL$ step without shift applied in each iteration of the semiseparable reduction.
\begin{figure}
% latex2html id marker 13037
\begin{tiny}\begin{displaymath}{%%\t...
...\right)
\end{array}\end{array}}\end{displaymath}\end{tiny}\noindent
\end{figure}

This means that from step $ i, \;i=1,\ldots,n,$ all the consecutive steps perform subspace iterations on the subspace of dimension $ i$. From [187], we know that these consecutive iterations on subspaces tend to create block lower triangular matrices. Hence, for a symmetric matrix these are block diagonal. Furthermore, the process works for all the nested subspaces at the same time, and so the semiseparable part of the matrices $ A^{(k)}$ generated by the proposed algorithm, becomes more and more block diagonal, and the blocks contain eigenvalues of the original matrix. This explains why in the numerical examples (see Chapter 6), the lower right block already gives a good estimate of the largest eigenvalues, since they are connected to a subspace on which the subspace iteration is performed most.

This insight also opens several new perspectives. In many problems, only the few largest eigenvalues (in absolute value) need to be computed [22,114,181,182,192,193]. In such cases, the proposed algorithm gives the required information after only few steps, without running the algorithm to completion. Moreover, because the sequence of similar matrices generated at each step of the algorithm converges to a block diagonal matrix, the original problem can be divided into smaller independent subproblems. A direct application of this behavior can be found in the field of optical flow. In this field one wants to reduce the complexity of storing movies by not considering a movie as a sequence of images, but as pictures in which the pixels are moving. In this way one can associate a vector field to each pixel. Pixels which are moving have the largest component in the vector field. This means that moving objects in the image correspond to separate subspaces [112].

We know that at each step of the algorithm a step of $ QL$ is performed on the semiseparable part of the matrix. This insight opens the perspective to a new research topic. Is it possible to incorporate a shift in this reduction, thereby performing a $ QL$-step with shift on the already semiseparable part. This will however destroy the semiseparable structure, but it will reduce the matrix into a similar semiseparable plus diagonal matrix. Moreover it is possible to extract a complete diagonal matrix, not only a shift, from the semiseparable part and then perform the $ QL$-step. In the following part we will prove that a $ QL$-step applied on a semiseparable plus diagonal matrix is again a semiseparable plus diagonal matrix, for which the diagonal did not change. This last statement justifies our previous statements.

We finish this section with a theorem from [187] concerning the speed of convergence of subspace iterations.

Definition 56 (From [][p. 8)   n890] Denote with $ \mathsf{S}$ and $ \mathsf{T}$ two subspaces, then the distance $ \dist(\mathsf{S},\mathsf{T})$ between these two subspaces is defined in the following way:

$\displaystyle \dist(\mathsf{S},\mathsf{T})=\sup_{s\in\mathsf{S},\Vert s\Vert _2=1}\;\;\;\;
\inf_{t\in\mathsf{T}} \Vert s-t\Vert _2.$

Using this definition, we can state the following convergence theorem:

Theorem 57 (From [][Theorem 5.4)   n890] Let $ A\in \mathbb{C}^{n\times n}$, and let $ p$ be a polynomial of degree $ \leq n$. Let $ \lambda_1, \ldots, \lambda_n$ denote the eigenvalues of $ A$, ordered so that $ \vert p(\lambda_1)\vert \geq \vert p(\lambda_2)\vert \geq
\cdots \geq \vert p(\lambda_n)\vert$. Suppose $ k$ is a positive integer less than $ n$ for which $ \vert p(\lambda_k)\vert > \vert p(\lambda_{k+1})\vert$. Let $ (p_i)$ be a sequence of polynomials of degree $ \leq n$ such that $ p_i \rightarrow p$ as $ i \rightarrow \infty $ and $ p_i(\lambda_j)
\neq 0$ for $ j=1,\ldots, k$ and all $ i$. Let $ \rho=\vert p(\lambda_{k+1})\vert/\vert p(\lambda_{k})\vert$. Let $ \mathsf{T}$ and $ \mathsf{U}$ be the invariant subspaces of $ A$ associated with $ \lambda_1,\ldots, \lambda_k$ and $ \lambda_{k+1},\ldots,\lambda_n$ respectively. Consider the nonstationary subspace iteration

$\displaystyle \mathsf{S}^{(i)} = p_i(A) \mathsf{S}^{(i-1)}$

where $ \mathsf{S}^{(0)}=\mathsf{S}$ is a $ k$-dimensional subspace of $ \mathbb{C}^n$ satisfying $ \mathsf{S}\cap \mathsf{U}=\{0\}$. Then for every $ \hat{\rho}$ satisfying $ \rho<\hat{\rho}<1$ there exists a constant $ \hat{C}$ such that

$\displaystyle \dist\left( \mathsf{S}^{(i)},\mathsf{T} \right)\leq
\hat{C} \hat{\rho}^i,\;\;\; i=1,2,3,\ldots$

In our case Theorem 57 can be applied with the polynomials $ p_i(z)=p(z)=z$.

We translate the previous theorem to the matrix case, for which we apply the $ QR$-algorithm without shift to the matrix $ A^{(0)}$. With $ A^{(k)}=Q_kR_k$ and $ A^{({k+1})}=R_kQ_k$. We have that for

% latex2html id marker 34844
$\displaystyle A^{(k)}=\left( \begin{array}{cc}
A_{...
...k)} & A_{12}^{(k)} \\  [0.1cm]
A_{21}^{(k)} & A_{22}^{(k)}
\end{array} \right),$

there exists a $ \hat{\rho}$, from Theorem 57, such that:

$\displaystyle \Vert A_{21}^{(k)}\Vert _2 < C \hat{\rho}^k.$


next up previous contents index
Next: subspace iteration and the Up: Convergence properties of the Previous: Lanczos-Ritz values in a   Contents   Index
Raf Vandebril 2004-05-03