Today we finish 4.4 and cover the Markov chains part of Section 4.6.
Not covering Section 4.5, or rest of 4.6 (which contains many
interesting applications!)
**Read** Section 5.1 for Monday.
Work through recommended homework questions.
Some updated!

**Tutorials:** Quiz 6 is next week. It covers Sections 4.4 and
the Markov Chains part of 4.6.

**Office hour:** Monday, 1:30-2:30, MC103B.

**Help Centers:** Monday-Friday 2:30-6:30 in MC 106.

**Definition:** Let $A$ and $B$ be $n \times n$ matrices. We say that
$A$ is **similar** to $B$ if there is an invertible matrix $P$ such that $P^{-1} A P = B$.
When this is the case, we write $A \sim B$.

**Theorem 4.22:**
Let $A$ and $B$ be similar matrices. Then:

a. $\det A = \det B$

b. $A$ is invertible iff $B$ is invertible.

c. $A$ and $B$ have the same rank.

d. $A$ and $B$ have the same characteristic polynomial.

e. $A$ and $B$ have the same eigenvalues.

**Definition:** $A$ is **diagonalizable** if it is similar to some diagonal matrix.

If $A$ is similar to a diagonal matrix $D$, then $D$ must have the eigenvalues of $A$ on the diagonal. But how to find $P$?

More precisely, there exist an invertible matrix $P$ and a diagonal matrix $D$ with $P^{-1}AP = D$ if and only if the columns of $P$ are $n$ linearly independent eigenvectors of $A$ and the diagonal entries of $D$ are the corresponding eigenvalues in the same order.

This theorem is one of the main reasons we want to be able to find eigenvectors of a matrix. Moreover, the more eigenvectors the better, so this motivates allowing complex eigenvectors.

**Theorem 4.24:** If $\lambda_1, \ldots, \lambda_k$ are distinct eigenvalues of $A$
and, for each $i$, $\cB_i$ is a basis for the eigenspace $E_{\lambda_i}$,
then the union of the $\cB_i$'s is a linearly independent set.

Combining Theorems 4.23 and 4.24 gives the following important consequence:

**Theorem:** An $n \times n$ matrix is diagonalizable if and only if
the sum of the geometric multiplicities of the eigenvalues is $n$.

In particular:

**Theorem 4.25:** If $A$ in an $n \times n$ matrix with $n$ distinct eigenvalues,
then $A$ is diagonalizable.

So it is important to understand the geometric multiplicities better. Here is a helpful result:

**Lemma 4.26:** If $\lambda_1$ is an eigenvalue of an $n \times n$ matrix $A$, then
$$
\text{geometric multiplicity of } \lambda_1 \leqslant
\text{algebraic multiplicity of } \lambda_1
$$

We'll prove this in a minute. First, let's look at what it implies:

Let $A$ be an $n \times n$ matrix with distinct eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_k$. Let their geometric multiplicities be $g_1, g_2, \ldots, g_k$ and their algebraic multiplicities be $a_1, a_2, \ldots, a_k$. We know $$ g_i \leqslant a_i \qtext{for each $i$} $$ and so $$ g_1 + \cdots + g_k \leqslant a_1 + \cdots + a_k \leqslant n $$ So the only way to have $g_1 + \cdots + g_k = n$ is to have $g_i = a_i$ for each $i$ and $a_1 + \cdots + a_k = n$.

This gives the **main theorem** of the section:

**Theorem 4.27 (The Diagonalization Theorem):**
Let $A$ be an $n \times n$ matrix with distinct eigenvalues
$\lambda_1, \lambda_2, \ldots, \lambda_k$.
Let their geometric multiplicities be $g_1, g_2, \ldots, g_k$
and their algebraic multiplicities be $a_1, a_2, \ldots, a_k$.
Then the following are equivalent:

a. $A$ is diagonalizable.

b. $g_1 + \cdots + g_k = n$.

c. $g_i = a_i$ for each $i$ and $a_1 + \cdots + a_k = n$.

**Note:** This is stated incorrectly in the text.
The red part must be added unless you are working over $\C$,
in which case it is automatic that $a_1 + \cdots + a_k = n$.
With the way I have stated it, it is correct over $\R$ or over $\C$.

We still need to prove:

**Lemma 4.26:** If $\lambda_1$ is an eigenvalue of an $n \times n$ matrix $A$, then
$$
\text{geometric multiplicity of } \lambda_1 \leqslant
\text{algebraic multiplicity of } \lambda_1
$$

**Proof (more direct than in text):**
Suppose that $\lambda_1$ is an eigenvalue of $A$ with geometric multiplicity $g$,
and let $\vv_1, \ldots, \vv_g$ be a basis for $E_{\lambda_1}$,
so $$A \vv_i = \lambda_1 \vv_i \qtext{for each $i$}$$
Let $Q$ be an invertible matrix whose first $g$ columns are $\vv_1, \ldots, \vv_g$:
$$
Q = [ \, \vv_1 \, \cdots \, \vv_g \qtext{other vectors} ]
$$
Since $Q^{-1} Q = I$, we know that $Q^{-1} \vv_i = \ve_i$ for $1 \leqslant i \leqslant g$.
Also, the first $g$ columns of $A Q$ are $\lambda_1 \vv_1, \ldots, \lambda_1 \vv_g$.
So the first $g$ columns of $Q^{-1} A Q$ are $\lambda_1 \ve_1, \ldots, \lambda_1 \ve_g$.
Therefore the matrix $Q^{-1} A Q$ has $\lambda_1$ as an eigenvalue with
algebraic multiplicity at least $g$.
But $Q^{-1} A Q$ has the same characteristic polynomial as $A$,
so $\lambda_1$ must also have algebraic multiplicity at least $g$ for $A$.$\quad\Box$.

Steps:

1. Compute the characteristic polynomial $\det(A - \lambda I)$ of $A$.

2. Find the roots of the characteristic polynomial and their algebraic
multiplicities by factoring.

3. If the algebraic multiplicities don't add up to $n$, then $A$ is
not diagonalizable, and you can stop. (If you are working over $\C$,
this can't happen.)

4. For each eigenvalue $\lambda$, compute the dimension of the
eigenspace $E_\lambda$. This is the geometric multiplicity of
$\lambda$, and if it is less than the algebraic multiplicity,
then $A$ is not diagonalizable, and you can stop.

5. Compute a basis for the eigenspace $E_\lambda$.

6. If for each eigenvalue the geometric multiplicity equals the
algebraic multiplicity, then you take the $n$ eigenvectors you
found and put them in the columns of a matrix $P$. Put the
eigenvalues in the same order on the diagonal of a matrix $D$.

7. **Check** that $AP=PD$.

Note that step 4 only requires you to find the row echelon form of $A - \lambda I$, as the number of free variables here is the geometric multiplicity. In step 5, you solve the system.

More generally, $A^k = P D^k P^{-1}$. This is clearly an efficient way to compute powers! Note that we need to know $P$, not just $D$, to do this.

See Example 4.29 for a sample calculation. We'll illustrate this result with an example from Markov Chains.

Since you must transition to some state, $P_{1j} + \cdots + P_{nj} = 1$.
That is, the entries in each column sum to 1.
Moreover, each entry $P_{ij} \geqslant 0$.
Such a $P$ is called a **stochastic matrix**.

We can represent the current state of the system with a **state vector**
$\vx \in \R^n$.
The $i$th entry of $\vx$ may denote the number of people/objects in state $i$.
Or we may divide by the total number, so the $i$th entry of $\vx$ gives the fraction
of people/objects in state $i$.
In this case, $\vx$ has non-negative entries that sum to 1 and is called a
**probability vector**.

If $\vx_k$ denotes the state after $k$ time steps, then the state after one more time step is given by $$ \vx_{k+1} = P \vx_k . $$ It follows that $\vx_k = P^k \vx_0$. Therefore:

The $ij$ entry $(P^k)_{ij}$ of $P^k$ is the probability of going from state $\red{j}$ to state $\red{i}$ in $k$ steps.

A state $\vx$ such that $P \vx = \vx$ is called a **steady state vector**.
This is the same as an eigenvector with eigenvalue 1. In Lecture 22, we
proved:

**Theorem 4.30:** Every stochastic matrix has a steady state vector,
i.e. it has $\lambda = 1$ as an eigenvalue.

We proved this using the fact that $P$ and $P^T$ have the same eigenvalues, and then noticing that the vector with all $1$'s is an eigenvector of $P^T$ with eigenvalue 1.

**Example:** We studied toothpaste usage, and had transition matrix
$$
P = \bmat{ll} 0.70 & 0.20 \\ 0.30 & 0.80 \emat .
$$
We noticed experimentally that a given starting state tends
to the state $\coll {0.4} {0.6}$ and that
$$
\bmat{ll} 0.70 & 0.20 \\ 0.30 & 0.80 \emat \coll {0.4} {0.6}
= \coll {0.4} {0.6} .
$$
We then found this steady state vector algebraically by solving $(I-P)\vx = \vec 0$.
[It is equivalent to solve $(P-I)\vx = \vec 0$.]

With our new tools, we can go further now.

This is a very general phenomenon:

If in addition the entries of $P$ are all positive, then all eigenvalues besides $\lambda = 1$ have $|\lambda| < 1$.

The other part of the Theorem is similar.

This theorem helps us understand the long-term behaviour:

**Theorem 4.33:**
Let $P$ be an $n \times n$ stochastic matrix all of whose entries are positive.
Then as $k \to \infty$, $P^k \to L$, a matrix all of whose columns are equal
to the same vector $\vx$ which is a steady state probability vector for $P$.

**Proof:**
We'll assume $P$ is diagonalizable: $Q^{-1} P Q = D$.
So $P^k = Q D^k Q^{-1}$. As $k \to \infty$, $D^k$ approaches a matrix $D^*$
with $1$'s and $0$'s on the diagonal (by Theorem 4.31), which means that $P^k$ approaches
$L = Q D^* Q^{-1}$.

Now that we know that $P^k$ has *some* limit $L$, we can deduce
something about it.
Since $\lim_{k \to \infty} P^k = L$, we have
$$
PL = P\lim_{k \to \infty} P^k = \lim_{k \to \infty} P^{k+1} = L
$$
This means that the columns of $L$ must be steady-state vectors for $P$.
Since the columns of $P^k$ are probability vectors, the same must be true
of the columns of $L$.
It's not hard to show that $P$ has a unique steady-state probability vector $\vx$,
so $L = [\, \vx \, \vx \, \cdots \, \vx \, ]$, as required.$\quad\Box$

Finally, we can deduce that Markov chains tend to their steady states:

**Theorem 4.34:**
Let $P$ be an $n \times n$ stochastic matrix all of whose entries are positive,
and let $\vx_0$ be any initial probability vector.
Then as $k \to \infty$, $\vx_k \to \vx$, where
$\vx$ is the steady state probability vector for $P$.

**Proof:**
Suppose that $\vx_0$ has components $x_1, x_2, \ldots, x_n$. Then
$$
\begin{aligned}
\lim_{k \to \infty} \vx_k &= \lim_{k \to \infty} P^k \vx_0 = L \vx_0 \\
&= [\, \vx \, \vx \, \cdots \, \vx \, ] \vx_0 \\
&= x_1 \vx + x_2 \vx + \cdots + x_n \vx \\
&= (x_1 + x_2 + \cdots + x_n ) \vx = \vx \quad\Box
\end{aligned}
$$

This result works both ways: if you compute the eigenvector with eigenvalue 1, that tells you the steady-state vector that other states go to as $k \to \infty$. But it also means that if you don't know the steady-state vector, you can approximate it by starting with any vector $\vx_0$ and computing $P^k \vx_0$ for large $k$!

The latter is what Google does to compute the page rank eigenvector.