Math 1600A Lecture 31, Section 2, 22 Nov 2013

$ \newcommand{\bdmat}[1]{\left|\begin{array}{#1}} \newcommand{\edmat}{\end{array}\right|} \newcommand{\bmat}[1]{\left[\begin{array}{#1}} \newcommand{\emat}{\end{array}\right]} \newcommand{\coll}[2]{\bmat{r} #1 \\ #2 \emat} \newcommand{\ccoll}[2]{\bmat{c} #1 \\ #2 \emat} \newcommand{\colll}[3]{\bmat{r} #1 \\ #2 \\ #3 \emat} \newcommand{\ccolll}[3]{\bmat{c} #1 \\ #2 \\ #3 \emat} \newcommand{\collll}[4]{\bmat{r} #1 \\ #2 \\ #3 \\ #4 \emat} \newcommand{\ccollll}[4]{\bmat{c} #1 \\ #2 \\ #3 \\ #4 \emat} \newcommand{\colllll}[5]{\bmat{r} #1 \\ #2 \\ #3 \\ #4 \\ #5 \emat} \newcommand{\ccolllll}[5]{\bmat{c} #1 \\ #2 \\ #3 \\ #4 \\ #5 \emat} \newcommand{\red}[1]{{\color{red}#1}} \newcommand{\lra}[1]{\mbox{$\xrightarrow{#1}$}} \newcommand{\rank}{\textrm{rank}} \newcommand{\row}{\textrm{row}} \newcommand{\col}{\textrm{col}} \newcommand{\null}{\textrm{null}} \newcommand{\nullity}{\textrm{nullity}} \renewcommand{\Re}{\operatorname{Re}} \renewcommand{\Im}{\operatorname{Im}} \renewcommand{\Arg}{\operatorname{Arg}} \renewcommand{\arg}{\operatorname{arg}} \newcommand{\adj}{\textrm{adj}} \newcommand{\mystack}[2]{\genfrac{}{}{0}{0}{#1}{#2}} \newcommand{\mystackthree}[3]{\mystack{\mystack{#1}{#2}}{#3}} \newcommand{\qimplies}{\quad\implies\quad} \newcommand{\qtext}[1]{\quad\text{#1}\quad} \newcommand{\qqtext}[1]{\qquad\text{#1}\qquad} \newcommand{\svec}[1]{\,\vec{#1}} \newcommand{\query}[1]{\toggle{\text{?}\vphantom{#1}}{#1}\endtoggle} \newcommand{\smallquery}[1]{\toggle{\text{?}}{#1}\endtoggle} $


Today we finish 4.4 and cover the Markov chains part of Section 4.6. Not covering Section 4.5, or rest of 4.6 (which contains many interesting applications!) Read Section 5.1 for Monday. Work through recommended homework questions. Some updated!

Tutorials: Quiz 6 is next week. It covers Sections 4.4 and the Markov Chains part of 4.6.
Office hour: Monday, 1:30-2:30, MC103B.
Help Centers: Monday-Friday 2:30-6:30 in MC 106.

Review of Section 4.4

Definition: Let $A$ and $B$ be $n \times n$ matrices. We say that $A$ is similar to $B$ if there is an invertible matrix $P$ such that $P^{-1} A P = B$. When this is the case, we write $A \sim B$.

Theorem 4.22: Let $A$ and $B$ be similar matrices. Then:
a. $\det A = \det B$
b. $A$ is invertible iff $B$ is invertible.
c. $A$ and $B$ have the same rank.
d. $A$ and $B$ have the same characteristic polynomial.
e. $A$ and $B$ have the same eigenvalues.

Definition: $A$ is diagonalizable if it is similar to some diagonal matrix.

If $A$ is similar to a diagonal matrix $D$, then $D$ must have the eigenvalues of $A$ on the diagonal. But how to find $P$?

Theorem 4.23: Let $A$ be an $n \times n$ matrix. Then $A$ is diagonalizable if and only if $A$ has $n$ linearly independent eigenvectors.

More precisely, there exist an invertible matrix $P$ and a diagonal matrix $D$ with $P^{-1}AP = D$ if and only if the columns of $P$ are $n$ linearly independent eigenvectors of $A$ and the diagonal entries of $D$ are the corresponding eigenvalues in the same order.

This theorem is one of the main reasons we want to be able to find eigenvectors of a matrix. Moreover, the more eigenvectors the better, so this motivates allowing complex eigenvectors.

Theorem 4.24: If $\lambda_1, \ldots, \lambda_k$ are distinct eigenvalues of $A$ and, for each $i$, $\cB_i$ is a basis for the eigenspace $E_{\lambda_i}$, then the union of the $\cB_i$'s is a linearly independent set.

Combining Theorems 4.23 and 4.24 gives the following important consequence:

Theorem: An $n \times n$ matrix is diagonalizable if and only if the sum of the geometric multiplicities of the eigenvalues is $n$.

In particular:

Theorem 4.25: If $A$ in an $n \times n$ matrix with $n$ distinct eigenvalues, then $A$ is diagonalizable.

So it is important to understand the geometric multiplicities better. Here is a helpful result:

Lemma 4.26: If $\lambda_1$ is an eigenvalue of an $n \times n$ matrix $A$, then $$ \text{geometric multiplicity of } \lambda_1 \leqslant \text{algebraic multiplicity of } \lambda_1 $$

We'll prove this in a minute. First, let's look at what it implies:

Let $A$ be an $n \times n$ matrix with distinct eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_k$. Let their geometric multiplicities be $g_1, g_2, \ldots, g_k$ and their algebraic multiplicities be $a_1, a_2, \ldots, a_k$. We know $$ g_i \leqslant a_i \qtext{for each $i$} $$ and so $$ g_1 + \cdots + g_k \leqslant a_1 + \cdots + a_k \leqslant n $$ So the only way to have $g_1 + \cdots + g_k = n$ is to have $g_i = a_i$ for each $i$ and $a_1 + \cdots + a_k = n$.

This gives the main theorem of the section:

Theorem 4.27 (The Diagonalization Theorem): Let $A$ be an $n \times n$ matrix with distinct eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_k$. Let their geometric multiplicities be $g_1, g_2, \ldots, g_k$ and their algebraic multiplicities be $a_1, a_2, \ldots, a_k$. Then the following are equivalent:
a. $A$ is diagonalizable.
b. $g_1 + \cdots + g_k = n$.
c. $g_i = a_i$ for each $i$ and $a_1 + \cdots + a_k = n$.

Note: This is stated incorrectly in the text. The red part must be added unless you are working over $\C$, in which case it is automatic that $a_1 + \cdots + a_k = n$. With the way I have stated it, it is correct over $\R$ or over $\C$.

New material

We still need to prove:

Lemma 4.26: If $\lambda_1$ is an eigenvalue of an $n \times n$ matrix $A$, then $$ \text{geometric multiplicity of } \lambda_1 \leqslant \text{algebraic multiplicity of } \lambda_1 $$

Proof (more direct than in text): Suppose that $\lambda_1$ is an eigenvalue of $A$ with geometric multiplicity $g$, and let $\vv_1, \ldots, \vv_g$ be a basis for $E_{\lambda_1}$, so $$A \vv_i = \lambda_1 \vv_i \qtext{for each $i$}$$ Let $Q$ be an invertible matrix whose first $g$ columns are $\vv_1, \ldots, \vv_g$: $$ Q = [ \, \vv_1 \, \cdots \, \vv_g \qtext{other vectors} ] $$ Since $Q^{-1} Q = I$, we know that $Q^{-1} \vv_i = \ve_i$ for $1 \leqslant i \leqslant g$. Also, the first $g$ columns of $A Q$ are $\lambda_1 \vv_1, \ldots, \lambda_1 \vv_g$. So the first $g$ columns of $Q^{-1} A Q$ are $\lambda_1 \ve_1, \ldots, \lambda_1 \ve_g$. Therefore the matrix $Q^{-1} A Q$ has $\lambda_1$ as an eigenvalue with algebraic multiplicity at least $g$. But $Q^{-1} A Q$ has the same characteristic polynomial as $A$, so $\lambda_1$ must also have algebraic multiplicity at least $g$ for $A$.$\quad\Box$.

Summary of diagonalization: Given an $n \times n$ matrix $A$, we would like to determine whether $A$ is diagonalizable, and if it is, find the invertible matrix $P$ and the diagonal matrix $D$ such that $P^{-1} A P = D$. The result may depend upon whether you are working over $\R$ or $\C$.


1. Compute the characteristic polynomial $\det(A - \lambda I)$ of $A$.
2. Find the roots of the characteristic polynomial and their algebraic multiplicities by factoring.
3. If the algebraic multiplicities don't add up to $n$, then $A$ is not diagonalizable, and you can stop. (If you are working over $\C$, this can't happen.)
4. For each eigenvalue $\lambda$, compute the dimension of the eigenspace $E_\lambda$. This is the geometric multiplicity of $\lambda$, and if it is less than the algebraic multiplicity, then $A$ is not diagonalizable, and you can stop.
5. Compute a basis for the eigenspace $E_\lambda$.
6. If for each eigenvalue the geometric multiplicity equals the algebraic multiplicity, then you take the $n$ eigenvectors you found and put them in the columns of a matrix $P$. Put the eigenvalues in the same order on the diagonal of a matrix $D$.
7. Check that $AP=PD$.

Note that step 4 only requires you to find the row echelon form of $A - \lambda I$, as the number of free variables here is the geometric multiplicity. In step 5, you solve the system.


Suppose $P^{-1} A P = D$, where $D$ is diagonal. Then $A = P D P^{-1}$. We can use this to compute powers of $A$. For example, $$ \begin{aligned} A^5 &= (P D P^{-1}) (P D P^{-1}) (P D P^{-1}) (P D P^{-1}) (P D P^{-1}) \\ &= P D^5 P^{-1} \end{aligned} $$ and $D^5$ is easy to compute since $D$ is diagonal: you just raise the diagonal entries to the fifth power.

More generally, $A^k = P D^k P^{-1}$. This is clearly an efficient way to compute powers! Note that we need to know $P$, not just $D$, to do this.

See Example 4.29 for a sample calculation. We'll illustrate this result with an example from Markov Chains.

Review of Markov chains

A Markov chain has a finite set of states $1, 2, \ldots, n$ and there is an $n \times n$ matrix $P$ (called the transition matrix) with the property that the $ij$ entry $P_{ij}$ is the probability that you transition from state $j$ to state $i$ in one time step.

Since you must transition to some state, $P_{1j} + \cdots + P_{nj} = 1$. That is, the entries in each column sum to 1. Moreover, each entry $P_{ij} \geqslant 0$. Such a $P$ is called a stochastic matrix.

We can represent the current state of the system with a state vector $\vx \in \R^n$. The $i$th entry of $\vx$ may denote the number of people/objects in state $i$. Or we may divide by the total number, so the $i$th entry of $\vx$ gives the fraction of people/objects in state $i$. In this case, $\vx$ has non-negative entries that sum to 1 and is called a probability vector.

If $\vx_k$ denotes the state after $k$ time steps, then the state after one more time step is given by $$ \vx_{k+1} = P \vx_k . $$ It follows that $\vx_k = P^k \vx_0$. Therefore:

The $ij$ entry $(P^k)_{ij}$ of $P^k$ is the probability of going from state $\red{j}$ to state $\red{i}$ in $k$ steps.

A state $\vx$ such that $P \vx = \vx$ is called a steady state vector. This is the same as an eigenvector with eigenvalue 1. In Lecture 22, we proved:

Theorem 4.30: Every stochastic matrix has a steady state vector, i.e. it has $\lambda = 1$ as an eigenvalue.

We proved this using the fact that $P$ and $P^T$ have the same eigenvalues, and then noticing that the vector with all $1$'s is an eigenvector of $P^T$ with eigenvalue 1.

Example: We studied toothpaste usage, and had transition matrix $$ P = \bmat{ll} 0.70 & 0.20 \\ 0.30 & 0.80 \emat . $$ We noticed experimentally that a given starting state tends to the state $\coll {0.4} {0.6}$ and that $$ \bmat{ll} 0.70 & 0.20 \\ 0.30 & 0.80 \emat \coll {0.4} {0.6} = \coll {0.4} {0.6} . $$ We then found this steady state vector algebraically by solving $(I-P)\vx = \vec 0$. [It is equivalent to solve $(P-I)\vx = \vec 0$.]

With our new tools, we can go further now.

Section 4.6: Markov chains

Let's compute powers of the matrix $P$ above. One can show that $P$ has characteristic polynomial $$ \det(P - \lambda I) = \lambda^2 - 1.5 \lambda + 0.5 = (\lambda - 1)(\lambda - 0.5) $$ and so has eigenvalues $\lambda_1 = 1$ and $\lambda_2 = 0.5$. The eigenspaces are $$ E_1 = \span(\coll 2 3) \qtext{and} E_{0.5} = \span(\coll 1 {-1}) $$ So if we write $Q = \bmat{rr} 2 & 1 \\ 3 & -1 \emat$, we have that $Q^{-1} P Q = \bmat{ll} 1 & 0 \\ 0 & 0.5 \emat = D$. Therefore, $$ \kern-6ex P^k = Q D^k Q^{-1} = \bmat{rr} 2 & 1 \\ 3 & -1 \emat \bmat{ll} 1^k & 0 \\ 0 & 0.5^k \emat \bmat{rr} 2 & 1 \\ 3 & -1 \emat ^{-1} $$ As $k \to \infty$, $0.5^k \to 0$, so $$ \kern-6ex P^k \to \bmat{rr} 2 & 1 \\ 3 & -1 \emat \bmat{ll} 1 & 0 \\ 0 & 0 \emat \bmat{rr} 2 & 1 \\ 3 & -1 \emat ^{-1} = \bmat{cc} 0.4 & 0.4 \\ 0.6 & 0.6 \emat $$ It follows that if we start with any state $\vx_0 = \coll a b$ with $a+b=1$, we'll find that $$ \kern-8ex \vx_k = P^k \vx_0 \to \bmat{cc} 0.4 & 0.4 \\ 0.6 & 0.6 \emat \coll a b = \coll {0.4a+0.4b} {0.6a+0.6b} = \coll {0.4} {0.6} $$ This explains why every state tends to the steady state! (It also gives a fast way to compute $\vx_k$ for large $k$.)

This is a very general phenomenon:

Theorem 4.31: Let $P$ be an $n \times n$ stochastic matrix. Then every eigenvalue $\lambda$ has $|\lambda| \leqslant 1$.

If in addition the entries of $P$ are all positive, then all eigenvalues besides $\lambda = 1$ have $|\lambda| < 1$.

The general proof just involves some inequalities, but the notation is confusing. Let's see how the argument goes in the special case of $$ P = \bmat{ll} 0.70 & 0.20 \\ 0.30 & 0.80 \emat . $$ The key idea is to study the eigenvalues of $P^T$, which are the same as those of $P$. Suppose $\coll a b$ is an eigenvector of $P^T$ with $0 \leqslant a \leqslant b$. Then $P^T \coll a b = \lambda \coll a b$ which means that $$ \coll {0.7a + 0.3b} {0.2a+0.8b} = \lambda \coll a b $$ The second component gives $$ \lambda b = 0.2a + 0.8b \leqslant 0.2b + 0.8b = b $$ and so $\lambda \leq 1$. If we allow $a$ and $b$ to be negative or complex, we need to use absolute values, and we can conclude that $|\lambda| \leq 1$.

The other part of the Theorem is similar.

This theorem helps us understand the long-term behaviour:

Theorem 4.33: Let $P$ be an $n \times n$ stochastic matrix all of whose entries are positive. Then as $k \to \infty$, $P^k \to L$, a matrix all of whose columns are equal to the same vector $\vx$ which is a steady state probability vector for $P$.

Proof: We'll assume $P$ is diagonalizable: $Q^{-1} P Q = D$. So $P^k = Q D^k Q^{-1}$. As $k \to \infty$, $D^k$ approaches a matrix $D^*$ with $1$'s and $0$'s on the diagonal (by Theorem 4.31), which means that $P^k$ approaches $L = Q D^* Q^{-1}$.

Now that we know that $P^k$ has some limit $L$, we can deduce something about it. Since $\lim_{k \to \infty} P^k = L$, we have $$ PL = P\lim_{k \to \infty} P^k = \lim_{k \to \infty} P^{k+1} = L $$ This means that the columns of $L$ must be steady-state vectors for $P$. Since the columns of $P^k$ are probability vectors, the same must be true of the columns of $L$. It's not hard to show that $P$ has a unique steady-state probability vector $\vx$, so $L = [\, \vx \, \vx \, \cdots \, \vx \, ]$, as required.$\quad\Box$

Finally, we can deduce that Markov chains tend to their steady states:

Theorem 4.34: Let $P$ be an $n \times n$ stochastic matrix all of whose entries are positive, and let $\vx_0$ be any initial probability vector. Then as $k \to \infty$, $\vx_k \to \vx$, where $\vx$ is the steady state probability vector for $P$.

Proof: Suppose that $\vx_0$ has components $x_1, x_2, \ldots, x_n$. Then $$ \begin{aligned} \lim_{k \to \infty} \vx_k &= \lim_{k \to \infty} P^k \vx_0 = L \vx_0 \\ &= [\, \vx \, \vx \, \cdots \, \vx \, ] \vx_0 \\ &= x_1 \vx + x_2 \vx + \cdots + x_n \vx \\ &= (x_1 + x_2 + \cdots + x_n ) \vx = \vx \quad\Box \end{aligned} $$

This result works both ways: if you compute the eigenvector with eigenvalue 1, that tells you the steady-state vector that other states go to as $k \to \infty$. But it also means that if you don't know the steady-state vector, you can approximate it by starting with any vector $\vx_0$ and computing $P^k \vx_0$ for large $k$!

The latter is what Google does to compute the page rank eigenvector.