Today we finish 4.4 and cover the Markov chains part of Section 4.6.
Not covering Section 4.5, or rest of 4.6 (which contains many
interesting applications!)
**Read Section 5.1** for next class.

**Tutorials:** Quiz 8 will cover Section 4.3,
the part of Appendix C we covered,
and what we finish today in Section 4.4.
There is also a quiz next week.

**Office hour:** Prof. Christensen's office hour
for Wednesday is cancelled.

**Help Centers:** Monday-Friday 2:30-6:30 in MC 106.

**Questions to discuss with your neighbour:** What does it mean for $A$
and $B$ to be similar? What properties do similar matrices have in common?
What does it mean for $A$ to be diagonalizable? How do we tell if it is,
and how do we diagonalize $A$?

**Definition:** Let $A$ and $B$ be $n \times n$ matrices. We say that
$A$ is **similar** to $B$ ($A \sim B$) if there is an invertible matrix $P$ such that $P^{-1} A P = B$.

**Theorem 4.22:**
Let $A$ and $B$ be similar matrices. Then $A$ and $B$ have the same
determinant, rank, characteristic polynomial and eigenvalues.

**Definition:** $A$ is **diagonalizable** if it is similar to some diagonal matrix.

If $A$ is similar to a diagonal matrix $D$, then $D$ must have the eigenvalues of $A$ on the diagonal. But how to find $P$?

On the other hand, if $P$ is any invertible matrix such that $P^{-1} A P$ is diagonal, then the columns of $P$ are linearly independent eigenvectors of $A$.

It follows that $A$ is diagonalizable if and only if it has $n$ linearly independent eigenvectors.

This theorem is one of the main reasons we want to be able to find eigenvectors of a matrix. Moreover, the more eigenvectors the better, so this motivates allowing complex eigenvectors.

**Theorem 4.24:** If $\lambda_1, \ldots, \lambda_k$ are distinct eigenvalues of $A$
and, for each $i$, $\cB_i$ is a basis for the eigenspace $E_{\lambda_i}$,
then the union of the $\cB_i$'s is a linearly independent set.

Combining Theorems 4.23 and 4.24 gives the following important consequence:

**Theorem:** An $n \times n$ matrix is diagonalizable if and only if
the sum of the geometric multiplicities of the eigenvalues is $n$.

In particular:

**Theorem 4.25:** If $A$ in an $n \times n$ matrix with $n$ distinct eigenvalues,
then $A$ is diagonalizable.

So it is important to understand the geometric multiplicities better. Here is a helpful result:

**Lemma 4.26:** If $\lambda_1$ is an eigenvalue of an $n \times n$ matrix $A$, then
$$
\kern-6ex
\text{geometric multiplicity of } \lambda_1 \leqslant
\text{algebraic multiplicity of } \lambda_1
$$

It follows that the only way for the geometric multiplicities to add to $n$ is if they are equal to the algebraic multiplicities and the algebraic multiplicities add to $n$:

**Theorem 4.27 (The Diagonalization Theorem):**
Let $A$ be an $n \times n$ matrix with distinct eigenvalues
$\lambda_1, \lambda_2, \ldots, \lambda_k$.
Let their geometric multiplicities be $g_1, g_2, \ldots, g_k$
and their algebraic multiplicities be $a_1, a_2, \ldots, a_k$.
Then the following are equivalent:

a. $A$ is diagonalizable.

b. $g_1 + \cdots + g_k = n$.

c. $g_i = a_i$ for each $i$ and $a_1 + \cdots + a_k = n$.

**Note:** This is stated incorrectly in the text.
The red part must be added unless you are working over $\C$,
in which case it is automatic that $a_1 + \cdots + a_k = n$.
With the way I have stated it, it is correct over $\R$ or over $\C$.

**Steps**:

1. Compute the characteristic polynomial $\det(A - \lambda I)$ of $A$.

2. Find the roots of the characteristic polynomial and their algebraic
multiplicities by factoring.

3. If the algebraic multiplicities don't add up to $n$, then $A$ is
not diagonalizable, and you can stop. (If you are working over $\C$,
this can't happen.)

4. For each eigenvalue $\lambda$, compute the dimension of the
eigenspace $E_\lambda$. This is the geometric multiplicity of
$\lambda$, and if it is less than the algebraic multiplicity,
then $A$ is not diagonalizable, and you can stop.

5. Compute a basis for the eigenspace $E_\lambda$.

6. If for each eigenvalue the geometric multiplicity equals the
algebraic multiplicity, then you take the $n$ eigenvectors you
found and put them in the columns of a matrix $P$. Put the
eigenvalues in the same order on the diagonal of a matrix $D$.

7. **Check** that $AP=PD$.

Note that step 4 only requires you to find the row echelon form of $A - \lambda I$, as the number of free variables here is the geometric multiplicity. In step 5, you solve the system.

I still owe you a proof of:

**Lemma 4.26:** If $\lambda_1$ is an eigenvalue of an $n \times n$ matrix $A$, then
$$
\text{geometric multiplicity of } \lambda_1 \leqslant
\text{algebraic multiplicity of } \lambda_1
$$

**Proof (more direct than in text):**
Suppose that $\lambda_1$ is an eigenvalue of $A$ with geometric multiplicity $g$,
and let $\vv_1, \ldots, \vv_g$ be a basis for $E_{\lambda_1}$,
so $$A \vv_i = \lambda_1 \vv_i \qtext{for each $i$.}$$
Let $Q$ be an invertible matrix whose first $g$ columns are $\vv_1, \ldots, \vv_g$:
$$
Q = [ \, \vv_1 \, \cdots \, \vv_g \qtext{other vectors} ]
$$
Since $Q^{-1} Q = I$, we know that $Q^{-1} \vv_i = \ve_i$ for $1 \leqslant i \leqslant g$.
Also, the first $g$ columns of $A Q$ are $\lambda_1 \vv_1, \ldots, \lambda_1 \vv_g$.
So the first $g$ columns of $Q^{-1} A Q$ are $\lambda_1 \ve_1, \ldots, \lambda_1 \ve_g$.
Therefore the matrix $Q^{-1} A Q$ has $\lambda_1$ as an eigenvalue with
algebraic multiplicity at least $g$.
But $Q^{-1} A Q$ has the same characteristic polynomial as $A$,
so $\lambda_1$ must also have algebraic multiplicity at least $g$ for $A$.$\quad\Box$.

More generally, $A^k = P D^k P^{-1}$. This is clearly an efficient way to compute powers! Note that we need to know $P$, not just $D$, to do this. Also note that even if $A$ is real, it would work to diagonalize $A$ over $\C$. The answer would be real, but the intermediate calculations would be complex.

**Example:** Let $A = \bmat{rr} 1 & 3 \\ 2 & 2 \emat$.
In Example 4.24 we found that $P^{-1} A P = D$,
where $P = \bmat{rr} 1 & 3 \\ 1 & -2 \emat$
and $D = \bmat{rr} 4 & 0 \\ 0 & -1 \emat$.
Therefore
$$
\kern-8ex
\begin{aligned}
A^5\ &= P D^5 P^{-1}
= \frac{1}{\det P} \bmat{rr} 1 & 3 \\ 1 & -2 \emat
\bmat{rr} 4^5 & 0 \\ 0 & (-1)^5 \emat \bmat{rr} -2 & -3 \\ -1 & 1 \emat \\
&= \frac{1}{-5} \bmat{rr} 1 & 3 \\ 1 & -2 \emat
\bmat{rr} 1024 & 0 \\ 0 & -1 \emat \bmat{rr} -2 & -3 \\ -1 & 1 \emat
= \cdots = \bmat{rr} 409 & 615 \\ 410 & 614 \emat
\end{aligned}
$$

See also Example 4.29. We'll find this result useful for Markov Chains.

Since you must transition to some state, $P_{1j} + \cdots + P_{nj} = 1$.
That is, the entries in each column sum to 1.
Moreover, each entry $P_{ij} \geqslant 0$.
Such a $P$ is called a **stochastic matrix**.

We can represent the current state of the system with a **state vector**
$\vx \in \R^n$.
The $i$th entry of $\vx$ may denote the number of people/objects in state $i$.
Or we may divide by the total number, so the $i$th entry of $\vx$ gives the fraction
of people/objects in state $i$.
In this case, $\vx$ has non-negative entries that sum to 1 and is called a
**probability vector**.

If $\vx_k$ denotes the state after $k$ time steps, then the state after one more time step is given by $$ \vx_{k+1} = P \vx_k . $$ It follows that $\vx_k = P^k \vx_0$. Therefore:

The $ij$ entry $(P^k)_{ij}$ of $P^k$ is the probability of going from state $\red{j}$ to state $\red{i}$ in $k$ steps.

A state $\vx$ such that $P \vx = \vx$ is called a **steady state vector**.
This is the same as an eigenvector with eigenvalue 1. In Lecture 22, we
proved:

**Theorem 4.30:** Every stochastic matrix has a steady state vector,
i.e. it has $\lambda = 1$ as an eigenvalue.

We proved this using the fact that $P$ and $P^T$ have the same eigenvalues, and then noticing that the vector with all $1$'s is an eigenvector of $P^T$ with eigenvalue 1.

**Example:** We studied toothpaste usage, and had transition matrix
$$
P = \bmat{ll} 0.70 & 0.20 \\ 0.30 & 0.80 \emat .
$$
We noticed experimentally that a given starting state tends
to the state $\coll {0.4} {0.6}$ and that
$$
\bmat{ll} 0.70 & 0.20 \\ 0.30 & 0.80 \emat \coll {0.4} {0.6}
= \coll {0.4} {0.6} .
$$
We then found this steady state vector algebraically by solving $(I-P)\vx = \vec 0$.
[It is equivalent to solve $(P-I)\vx = \vec 0$.]

With our new tools, we can go further now.

This is a very general phenomenon, which we'll spend the rest of the lecture understanding.

If in addition the entries of $P$ are all positive, then all eigenvalues besides $\lambda = 1$ have $|\lambda| < 1$.

The other part of the Theorem is similar.

The next theorem helps us understand the long-term behaviour:

**Theorem 4.33:**
Let $P$ be an $n \times n$ stochastic matrix all of whose entries are positive.
Then as $k \to \infty$, $P^k \to L$, a matrix all of whose columns are equal
to the same vector $\vx$ which is a steady state probability vector for $P$.