## Announcements:

Course evaluations at start.

Today we finish 4.4 and cover the Markov chains part of Section 4.6. Not covering Section 4.5, or rest of 4.6 (which contains many interesting applications!) Read Section 5.1 for next class.

Tutorials: Quiz 9 covers Section 4.4 only.
Office hour: Mon 1:30-2:30 and Wed 10:30-11:15, MC103B.
Help Centers: Monday-Friday 2:30-6:30 in MC 106.

Questions to discuss with your neighbour: What does it mean for $A$ and $B$ to be similar? What properties do similar matrices have in common? What does it mean for $A$ to be diagonalizable? How do we tell if it is, and how do we diagonalize $A$?

### Review of Section 4.4:

Definition: Let $A$ and $B$ be $n \times n$ matrices. We say that $A$ is similar to $B$ ($A \sim B$) if there is an invertible matrix $P$ such that $P^{-1} A P = B$.

Theorem 4.22: Let $A$ and $B$ be similar matrices. Then $A$ and $B$ have the same determinant, rank, characteristic polynomial and eigenvalues.

Definition: $A$ is diagonalizable if it is similar to some diagonal matrix.

If $A$ is similar to a diagonal matrix $D$, then $D$ must have the eigenvalues of $A$ on the diagonal. But how to find $P$?

Theorem 4.23: Let $A$ be an $n \times n$ matrix. If $P$ is an $n \times n$ matrix whose columns are linearly independent eigenvectors of $A$, then $P^{-1} A P$ is a diagonal matrix $D$ with the corresponding eigenvalues of $A$ on the diagonal.

On the other hand, if $P$ is any invertible matrix such that $P^{-1} A P$ is diagonal, then the columns of $P$ are linearly independent eigenvectors of $A$.

It follows that $A$ is diagonalizable if and only if it has $n$ linearly independent eigenvectors.

This theorem is one of the main reasons we want to be able to find eigenvectors of a matrix. Moreover, the more eigenvectors the better, so this motivates allowing complex eigenvectors.

Theorem 4.24: If $\lambda_1, \ldots, \lambda_k$ are distinct eigenvalues of $A$ and, for each $i$, $\cB_i$ is a basis for the eigenspace $E_{\lambda_i}$, then the union of the $\cB_i$'s is a linearly independent set.

Combining Theorems 4.23 and 4.24 gives the following important consequence:

Theorem: An $n \times n$ matrix is diagonalizable if and only if the sum of the geometric multiplicities of the eigenvalues is $n$.

In particular:

Theorem 4.25: If $A$ in an $n \times n$ matrix with $n$ distinct eigenvalues, then $A$ is diagonalizable.

So it is important to understand the geometric multiplicities better. Here is a helpful result:

Lemma 4.26: If $\lambda_1$ is an eigenvalue of an $n \times n$ matrix $A$, then $$\kern-6ex \text{geometric multiplicity of } \lambda_1 \leqslant \text{algebraic multiplicity of } \lambda_1$$

It follows that the only way for the geometric multiplicities to add to $n$ is if they are equal to the algebraic multiplicities and the algebraic multiplicities add to $n$:

Theorem 4.27 (The Diagonalization Theorem): Let $A$ be an $n \times n$ matrix with distinct eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_k$. Let their geometric multiplicities be $g_1, g_2, \ldots, g_k$ and their algebraic multiplicities be $a_1, a_2, \ldots, a_k$. Then the following are equivalent:
a. $A$ is diagonalizable.
b. $g_1 + \cdots + g_k = n$.
c. $g_i = a_i$ for each $i$ and $a_1 + \cdots + a_k = n$.

Note: This is stated incorrectly in the text. The red part must be added unless you are working over $\C$, in which case it is automatic that $a_1 + \cdots + a_k = n$. With the way I have stated it, it is correct over $\R$ or over $\C$.

Summary of diagonalization: Given an $n \times n$ matrix $A$, we would like to determine whether $A$ is diagonalizable, and if it is, find the invertible matrix $P$ and the diagonal matrix $D$ such that $P^{-1} A P = D$. The result may depend upon whether you are working over $\R$ or $\C$.

Steps:

1. Compute the characteristic polynomial $\det(A - \lambda I)$ of $A$.
2. Find the roots of the characteristic polynomial and their algebraic multiplicities by factoring.
3. If the algebraic multiplicities don't add up to $n$, then $A$ is not diagonalizable, and you can stop. (If you are working over $\C$, this can't happen.)
4. For each eigenvalue $\lambda$, compute the dimension of the eigenspace $E_\lambda$. This is the geometric multiplicity of $\lambda$, and if it is less than the algebraic multiplicity, then $A$ is not diagonalizable, and you can stop.
5. Compute a basis for the eigenspace $E_\lambda$.
6. If for each eigenvalue the geometric multiplicity equals the algebraic multiplicity, then you take the $n$ eigenvectors you found and put them in the columns of a matrix $P$. Put the eigenvalues in the same order on the diagonal of a matrix $D$.
7. Check that $AP=PD$.

Note that step 4 only requires you to find the row echelon form of $A - \lambda I$, as the number of free variables here is the geometric multiplicity. In step 5, you solve the system.

### Powers:

Suppose $P^{-1} A P = D$, where $D$ is diagonal. Then $A = P D P^{-1}$. We can use this to compute powers of $A$. For example, \kern-8ex \begin{aligned} A^5 &= (P D P^{-1}) (P D P^{-1}) (P D P^{-1}) (P D P^{-1}) (P D P^{-1}) \\ &= P D^5 P^{-1} \end{aligned} and $D^5$ is easy to compute since $D$ is diagonal: you just raise the diagonal entries to the fifth power.

More generally, $A^k = P D^k P^{-1}$. This is clearly an efficient way to compute powers! Note that we need to know $P$, not just $D$, to do this. Also note that even if $A$ is real, it would work to diagonalize $A$ over $\C$. The answer would be real, but the intermediate calculations would be complex.

See Example 4.29 for a sample calculation. We'll illustrate this result with an example from Markov Chains.

### Review of Markov chains:

A Markov chain has a finite set of states $1, 2, \ldots, n$ and there is an $n \times n$ matrix $P$ (called the transition matrix) with the property that the $ij$ entry $P_{ij}$ is the probability that you transition from state $j$ to state $i$ in one time step.

Since you must transition to some state, $P_{1j} + \cdots + P_{nj} = 1$. That is, the entries in each column sum to 1. Moreover, each entry $P_{ij} \geqslant 0$. Such a $P$ is called a stochastic matrix.

We can represent the current state of the system with a state vector $\vx \in \R^n$. The $i$th entry of $\vx$ may denote the number of people/objects in state $i$. Or we may divide by the total number, so the $i$th entry of $\vx$ gives the fraction of people/objects in state $i$. In this case, $\vx$ has non-negative entries that sum to 1 and is called a probability vector.

If $\vx_k$ denotes the state after $k$ time steps, then the state after one more time step is given by $$\vx_{k+1} = P \vx_k .$$ It follows that $\vx_k = P^k \vx_0$. Therefore:

The $ij$ entry $(P^k)_{ij}$ of $P^k$ is the probability of going from state $\red{j}$ to state $\red{i}$ in $k$ steps.

A state $\vx$ such that $P \vx = \vx$ is called a steady state vector. This is the same as an eigenvector with eigenvalue 1. In Lecture 22, we proved:

Theorem 4.30: Every stochastic matrix has a steady state vector, i.e. it has $\lambda = 1$ as an eigenvalue.

We proved this using the fact that $P$ and $P^T$ have the same eigenvalues, and then noticing that the vector with all $1$'s is an eigenvector of $P^T$ with eigenvalue 1.

Example: We studied toothpaste usage, and had transition matrix $$P = \bmat{ll} 0.70 & 0.20 \\ 0.30 & 0.80 \emat .$$ We noticed experimentally that a given starting state tends to the state $\coll {0.4} {0.6}$ and that $$\bmat{ll} 0.70 & 0.20 \\ 0.30 & 0.80 \emat \coll {0.4} {0.6} = \coll {0.4} {0.6} .$$ We then found this steady state vector algebraically by solving $(I-P)\vx = \vec 0$. [It is equivalent to solve $(P-I)\vx = \vec 0$.]

With our new tools, we can go further now.

### Section 4.6: Markov chains:

Let's compute powers of the matrix $P$ above. One can show that $P$ has characteristic polynomial $$\kern-6ex \det(P - \lambda I) = \lambda^2 - 1.5 \lambda + 0.5 = (\lambda - 1)(\lambda - 0.5)$$ and so has eigenvalues $\lambda_1 = 1$ and $\lambda_2 = 0.5$. The eigenspaces are $$\kern-6ex E_1 = \span(\coll 2 3) \qtext{and} E_{0.5} = \span(\coll 1 {-1})$$ So if we write $Q = \bmat{rr} 2 & 1 \\ 3 & -1 \emat$, we have that $Q^{-1} P Q = \bmat{ll} 1 & 0 \\ 0 & 0.5 \emat = D$. Therefore, $$\kern-6ex P^k = Q D^k Q^{-1} = \bmat{rr} 2 & 1 \\ 3 & -1 \emat \bmat{ll} 1^k & 0 \\ 0 & 0.5^k \emat \bmat{rr} 2 & 1 \\ 3 & -1 \emat ^{-1}$$ As $k \to \infty$, $0.5^k \to 0$, so $$\kern-6ex P^k \to \bmat{rr} 2 & 1 \\ 3 & -1 \emat \bmat{ll} 1 & 0 \\ 0 & 0 \emat \bmat{rr} 2 & 1 \\ 3 & -1 \emat ^{-1} = \bmat{cc} 0.4 & 0.4 \\ 0.6 & 0.6 \emat$$ It follows that if we start with any state $\vx_0 = \coll a b$ with $a+b=1$, we'll find that $$\kern-8ex \vx_k = P^k \vx_0 \to \bmat{cc} 0.4 & 0.4 \\ 0.6 & 0.6 \emat \coll a b = \coll {0.4a+0.4b} {0.6a+0.6b} = \coll {0.4} {0.6}$$ This explains why every state tends to the steady state! (It also gives a fast way to compute $\vx_k$ for large $k$.)

This is a very general phenomenon, which we'll spend the rest of the lecture understanding.

Theorem 4.31: Let $P$ be an $n \times n$ stochastic matrix. Then every eigenvalue $\lambda$ has $|\lambda| \leqslant 1$.

If in addition the entries of $P$ are all positive, then all eigenvalues besides $\lambda = 1$ have $|\lambda| < 1$.

The general proof just involves some inequalities, but the notation is confusing. Let's see how the argument goes in the special case of $$P = \bmat{ll} 0.70 & 0.20 \\ 0.30 & 0.80 \emat .$$ The key idea is to study the eigenvalues of $P^T$, which are the same as those of $P$. Suppose $\coll a b$ is an eigenvector of $P^T$ with $0 \leqslant a \leqslant b$. Then $P^T \coll a b = \lambda \coll a b$ which means that $$\coll {0.7a + 0.3b} {0.2a+0.8b} = \lambda \coll a b$$ The second component gives $$\lambda b = 0.2a + 0.8b \leqslant 0.2b + 0.8b = b$$ and so $\lambda \leq 1$. If we allow $a$ and $b$ to be negative or complex, we need to use absolute values, and we can conclude that $|\lambda| \leq 1$.

The other part of the Theorem is similar.

The next theorem helps us understand the long-term behaviour:

Theorem 4.33: Let $P$ be an $n \times n$ stochastic matrix all of whose entries are positive. Then as $k \to \infty$, $P^k \to L$, a matrix all of whose columns are equal to the same vector $\vx$ which is a steady state probability vector for $P$.

Proof: We'll assume $P$ is diagonalizable: $Q^{-1} P Q = D$. So $P^k = Q D^k Q^{-1}$. As $k \to \infty$, $D^k$ approaches a matrix $D^*$ with $1$'s and $0$'s on the diagonal (by Theorem 4.31), which means that $P^k$ approaches $L = Q D^* Q^{-1}$.

Now that we know that $P^k$ has some limit $L$, we can deduce something about it. Since $\lim_{k \to \infty} P^k = L$, we have $$PL = P\lim_{k \to \infty} P^k = \lim_{k \to \infty} P^{k+1} = L$$ This means that the columns of $L$ must be steady-state vectors for $P$. Since the columns of $P^k$ are probability vectors, the same must be true of the columns of $L$. It's not hard to show that $P$ has a unique steady-state probability vector $\vx$, so $L = [\, \vx \, \vx \, \cdots \, \vx \, ]$, as required.$\quad\Box$

Finally, we can deduce that Markov chains tend to their steady states:

Theorem 4.34: Let $P$ be an $n \times n$ stochastic matrix all of whose entries are positive, and let $\vx_0$ be any initial probability vector. Then as $k \to \infty$, $\vx_k \to \vx$, where $\vx$ is the steady state probability vector for $P$.

Proof: Suppose that $\vx_0$ has components $x_1, x_2, \ldots, x_n$. Then \begin{aligned} \lim_{k \to \infty} \vx_k &= \lim_{k \to \infty} P^k \vx_0 = L \vx_0 \\ &= [\, \vx \, \vx \, \cdots \, \vx \, ] \vx_0 \\ &= x_1 \vx + x_2 \vx + \cdots + x_n \vx \\ &= (x_1 + x_2 + \cdots + x_n ) \vx = \vx \quad\Box \end{aligned}

This result works both ways: if you compute the eigenvector with eigenvalue 1, that tells you the steady-state vector that other states go to as $k \to \infty$. But it also means that if you don't know the steady-state vector, you can approximate it by starting with any vector $\vx_0$ and computing $P^k \vx_0$ for large $k$!

The latter is what Google does to compute the page rank eigenvector.

Note: A transition matrix $P$ is regular if some power $P^k$ of it has positive entries. This means that there is a nonzero probably to get from any starting state to any ending state after some number of steps. The text proves the above results for regular matrices, which is not hard once you know them for matrices with positive entries.