Course evaluations at start.

Today we finish 4.4 and cover the Markov chains part of Section 4.6.
Not covering Section 4.5, or rest of 4.6 (which contains many
interesting applications!)
**Read Section 5.1** for next class.

**Tutorials:** Quiz 9 covers Section 4.4 only.

**Office hour:** Mon 1:30-2:30 and Wed 10:30-11:15, MC103B.

**Help Centers:** Monday-Friday 2:30-6:30 in MC 106.

**Questions to discuss with your neighbour:** What does it mean for $A$
and $B$ to be similar? What properties do similar matrices have in common?
What does it mean for $A$ to be diagonalizable? How do we tell if it is,
and how do we diagonalize $A$?

**Definition:** Let $A$ and $B$ be $n \times n$ matrices. We say that
$A$ is **similar** to $B$ ($A \sim B$) if there is an invertible matrix $P$ such that $P^{-1} A P = B$.

**Theorem 4.22:**
Let $A$ and $B$ be similar matrices. Then $A$ and $B$ have the same
determinant, rank, characteristic polynomial and eigenvalues.

**Definition:** $A$ is **diagonalizable** if it is similar to some diagonal matrix.

If $A$ is similar to a diagonal matrix $D$, then $D$ must have the eigenvalues of $A$ on the diagonal. But how to find $P$?

On the other hand, if $P$ is any invertible matrix such that $P^{-1} A P$ is diagonal, then the columns of $P$ are linearly independent eigenvectors of $A$.

It follows that $A$ is diagonalizable if and only if it has $n$ linearly independent eigenvectors.

This theorem is one of the main reasons we want to be able to find eigenvectors of a matrix. Moreover, the more eigenvectors the better, so this motivates allowing complex eigenvectors.

**Theorem 4.24:** If $\lambda_1, \ldots, \lambda_k$ are distinct eigenvalues of $A$
and, for each $i$, $\cB_i$ is a basis for the eigenspace $E_{\lambda_i}$,
then the union of the $\cB_i$'s is a linearly independent set.

Combining Theorems 4.23 and 4.24 gives the following important consequence:

**Theorem:** An $n \times n$ matrix is diagonalizable if and only if
the sum of the geometric multiplicities of the eigenvalues is $n$.

In particular:

**Theorem 4.25:** If $A$ in an $n \times n$ matrix with $n$ distinct eigenvalues,
then $A$ is diagonalizable.

So it is important to understand the geometric multiplicities better. Here is a helpful result:

**Lemma 4.26:** If $\lambda_1$ is an eigenvalue of an $n \times n$ matrix $A$, then
$$
\kern-6ex
\text{geometric multiplicity of } \lambda_1 \leqslant
\text{algebraic multiplicity of } \lambda_1
$$

It follows that the only way for the geometric multiplicities to add to $n$ is if they are equal to the algebraic multiplicities and the algebraic multiplicities add to $n$:

**Theorem 4.27 (The Diagonalization Theorem):**
Let $A$ be an $n \times n$ matrix with distinct eigenvalues
$\lambda_1, \lambda_2, \ldots, \lambda_k$.
Let their geometric multiplicities be $g_1, g_2, \ldots, g_k$
and their algebraic multiplicities be $a_1, a_2, \ldots, a_k$.
Then the following are equivalent:

a. $A$ is diagonalizable.

b. $g_1 + \cdots + g_k = n$.

c. $g_i = a_i$ for each $i$ and $a_1 + \cdots + a_k = n$.

**Note:** This is stated incorrectly in the text.
The red part must be added unless you are working over $\C$,
in which case it is automatic that $a_1 + \cdots + a_k = n$.
With the way I have stated it, it is correct over $\R$ or over $\C$.

**Steps**:

1. Compute the characteristic polynomial $\det(A - \lambda I)$ of $A$.

2. Find the roots of the characteristic polynomial and their algebraic
multiplicities by factoring.

3. If the algebraic multiplicities don't add up to $n$, then $A$ is
not diagonalizable, and you can stop. (If you are working over $\C$,
this can't happen.)

4. For each eigenvalue $\lambda$, compute the dimension of the
eigenspace $E_\lambda$. This is the geometric multiplicity of
$\lambda$, and if it is less than the algebraic multiplicity,
then $A$ is not diagonalizable, and you can stop.

5. Compute a basis for the eigenspace $E_\lambda$.

6. If for each eigenvalue the geometric multiplicity equals the
algebraic multiplicity, then you take the $n$ eigenvectors you
found and put them in the columns of a matrix $P$. Put the
eigenvalues in the same order on the diagonal of a matrix $D$.

7. **Check** that $AP=PD$.

Note that step 4 only requires you to find the row echelon form of $A - \lambda I$, as the number of free variables here is the geometric multiplicity. In step 5, you solve the system.

More generally, $A^k = P D^k P^{-1}$. This is clearly an efficient way to compute powers! Note that we need to know $P$, not just $D$, to do this. Also note that even if $A$ is real, it would work to diagonalize $A$ over $\C$. The answer would be real, but the intermediate calculations would be complex.

See Example 4.29 for a sample calculation. We'll illustrate this result with an example from Markov Chains.

Since you must transition to some state, $P_{1j} + \cdots + P_{nj} = 1$.
That is, the entries in each column sum to 1.
Moreover, each entry $P_{ij} \geqslant 0$.
Such a $P$ is called a **stochastic matrix**.

We can represent the current state of the system with a **state vector**
$\vx \in \R^n$.
The $i$th entry of $\vx$ may denote the number of people/objects in state $i$.
Or we may divide by the total number, so the $i$th entry of $\vx$ gives the fraction
of people/objects in state $i$.
In this case, $\vx$ has non-negative entries that sum to 1 and is called a
**probability vector**.

If $\vx_k$ denotes the state after $k$ time steps, then the state after one more time step is given by $$ \vx_{k+1} = P \vx_k . $$ It follows that $\vx_k = P^k \vx_0$. Therefore:

The $ij$ entry $(P^k)_{ij}$ of $P^k$ is the probability of going from state $\red{j}$ to state $\red{i}$ in $k$ steps.

A state $\vx$ such that $P \vx = \vx$ is called a **steady state vector**.
This is the same as an eigenvector with eigenvalue 1. In Lecture 22, we
proved:

**Theorem 4.30:** Every stochastic matrix has a steady state vector,
i.e. it has $\lambda = 1$ as an eigenvalue.

We proved this using the fact that $P$ and $P^T$ have the same eigenvalues, and then noticing that the vector with all $1$'s is an eigenvector of $P^T$ with eigenvalue 1.

**Example:** We studied toothpaste usage, and had transition matrix
$$
P = \bmat{ll} 0.70 & 0.20 \\ 0.30 & 0.80 \emat .
$$
We noticed experimentally that a given starting state tends
to the state $\coll {0.4} {0.6}$ and that
$$
\bmat{ll} 0.70 & 0.20 \\ 0.30 & 0.80 \emat \coll {0.4} {0.6}
= \coll {0.4} {0.6} .
$$
We then found this steady state vector algebraically by solving $(I-P)\vx = \vec 0$.
[It is equivalent to solve $(P-I)\vx = \vec 0$.]

With our new tools, we can go further now.

This is a very general phenomenon, which we'll spend the rest of the lecture understanding.

If in addition the entries of $P$ are all positive, then all eigenvalues besides $\lambda = 1$ have $|\lambda| < 1$.

The other part of the Theorem is similar.

The next theorem helps us understand the long-term behaviour:

**Theorem 4.33:**
Let $P$ be an $n \times n$ stochastic matrix all of whose entries are positive.
Then as $k \to \infty$, $P^k \to L$, a matrix all of whose columns are equal
to the same vector $\vx$ which is a steady state probability vector for $P$.

**Proof:**
We'll assume $P$ is diagonalizable: $Q^{-1} P Q = D$.
So $P^k = Q D^k Q^{-1}$. As $k \to \infty$, $D^k$ approaches a matrix $D^*$
with $1$'s and $0$'s on the diagonal (by Theorem 4.31), which means that $P^k$ approaches
$L = Q D^* Q^{-1}$.

Now that we know that $P^k$ has *some* limit $L$, we can deduce
something about it.
Since $\lim_{k \to \infty} P^k = L$, we have
$$
PL = P\lim_{k \to \infty} P^k = \lim_{k \to \infty} P^{k+1} = L
$$
This means that the columns of $L$ must be steady-state vectors for $P$.
Since the columns of $P^k$ are probability vectors, the same must be true
of the columns of $L$.
It's not hard to show that $P$ has a unique steady-state probability vector $\vx$,
so $L = [\, \vx \, \vx \, \cdots \, \vx \, ]$, as required.$\quad\Box$

Finally, we can deduce that Markov chains tend to their steady states:

**Theorem 4.34:**
Let $P$ be an $n \times n$ stochastic matrix all of whose entries are positive,
and let $\vx_0$ be any initial probability vector.
Then as $k \to \infty$, $\vx_k \to \vx$, where
$\vx$ is the steady state probability vector for $P$.

**Proof:**
Suppose that $\vx_0$ has components $x_1, x_2, \ldots, x_n$. Then
$$
\begin{aligned}
\lim_{k \to \infty} \vx_k &= \lim_{k \to \infty} P^k \vx_0 = L \vx_0 \\
&= [\, \vx \, \vx \, \cdots \, \vx \, ] \vx_0 \\
&= x_1 \vx + x_2 \vx + \cdots + x_n \vx \\
&= (x_1 + x_2 + \cdots + x_n ) \vx = \vx \quad\Box
\end{aligned}
$$

This result works both ways: if you compute the eigenvector with eigenvalue 1, that tells you the steady-state vector that other states go to as $k \to \infty$. But it also means that if you don't know the steady-state vector, you can approximate it by starting with any vector $\vx_0$ and computing $P^k \vx_0$ for large $k$!

The latter is what Google does to compute the page rank eigenvector.

**Note:** A transition matrix $P$ is **regular** if some
power $P^k$ of it has positive entries. This means that there
is a nonzero probably to get from any starting state to any
ending state after some number of steps.
The text proves the above results for regular matrices, which
is not hard once you know them for matrices with positive entries.