Math 1600 Lecture 32

$ \newcommand{\bdmat}[1]{\left|\begin{array}{#1}} \newcommand{\edmat}{\end{array}\right|} \newcommand{\bmat}[1]{\left[\begin{array}{#1}} \newcommand{\emat}{\end{array}\right]} \newcommand{\coll}[2]{\bmat{r} #1 \\ #2 \emat} \newcommand{\ccoll}[2]{\bmat{c} #1 \\ #2 \emat} \newcommand{\colll}[3]{\bmat{r} #1 \\ #2 \\ #3 \emat} %\def\colll #1 #2 #3{\bmat{r} #1 \\ #2 \\ #3 \emat} \newcommand{\ccolll}[3]{\bmat{c} #1 \\ #2 \\ #3 \emat} \newcommand{\collll}[4]{\bmat{r} #1 \\ #2 \\ #3 \\ #4 \emat} \newcommand{\ccollll}[4]{\bmat{c} #1 \\ #2 \\ #3 \\ #4 \emat} \newcommand{\colllll}[5]{\bmat{r} #1 \\ #2 \\ #3 \\ #4 \\ #5 \emat} \newcommand{\ccolllll}[5]{\bmat{c} #1 \\ #2 \\ #3 \\ #4 \\ #5 \emat} \newcommand{\red}[1]{{\color{red}#1}} \newcommand{\lra}[1]{\mbox{$\xrightarrow{#1}$}} \newcommand{\rank}{\textrm{rank}} \newcommand{\row}{\textrm{row}} \newcommand{\col}{\textrm{col}} \newcommand{\null}{\textrm{null}} \newcommand{\nullity}{\textrm{nullity}} \renewcommand{\Re}{\operatorname{Re}} \renewcommand{\Im}{\operatorname{Im}} \renewcommand{\Arg}{\operatorname{Arg}} \renewcommand{\arg}{\operatorname{arg}} \newcommand{\adj}{\textrm{adj}} \newcommand{\mystack}[2]{\genfrac{}{}{0}{0}{#1}{#2}} \newcommand{\mystackthree}[3]{\mystack{\mystack{#1}{#2}}{#3}} \newcommand{\qimplies}{\quad\implies\quad} \newcommand{\qtext}[1]{\quad\text{#1}\quad} \newcommand{\qqtext}[1]{\qquad\text{#1}\qquad} \newcommand{\smalltext}[1]{{\small\text{#1}}} \newcommand{\svec}[1]{\,\vec{#1}} \newcommand{\querytext}[1]{\toggle{\text{?}\vphantom{\text{#1}}}{\text{#1}}\endtoggle} \newcommand{\query}[1]{\toggle{\text{?}\vphantom{#1}}{#1}\endtoggle} \newcommand{\smallquery}[1]{\toggle{\text{?}}{#1}\endtoggle} \newcommand{\bv}{\mathbf{v}} %\require{AMScd} $

Announcements:

Today we finish 4.6 and start Section 5.1. Continue reading Section 5.1 for next class, and start reading Section 5.2. Work through recommended homework questions.

Review: Section 4.6: Markov chains:

Theorem 4.30: Every stochastic matrix has a steady state vector, i.e. it has $\lambda = 1$ as an eigenvalue.

Theorem 4.33: Let $P$ be an $n \times n$ stochastic matrix all of whose entries are positive. Then as $k \to \infty$, $P^k \to L$, a matrix all of whose columns are equal to the same vector $\vx$ which is a steady state probability vector for $P$.

New material:

Proof of 4.33: We'll assume $P$ is diagonalizable: $Q^{-1} P Q = D$. So $P^k = Q D^k Q^{-1}$. As $k \to \infty$, $D^k$ approaches a matrix $D^*$ with $1$'s and $0$'s on the diagonal (by Theorem 4.31), which means that $P^k$ approaches $L = Q D^* Q^{-1}$.

Now that we know that $P^k$ has some limit $L$, we can deduce something about it. Since $\lim_{k \to \infty} P^k = L$, we have $$ PL = P\lim_{k \to \infty} P^k = \lim_{k \to \infty} P^{k+1} = L $$ This means that the columns of $L$ must be steady-state vectors for $P$. Since the columns of $P^k$ are probability vectors, the same must be true of the columns of $L$. It's not hard to show that $P$ has a unique steady-state probability vector $\vx$, so $L = [\, \vx \, \vx \, \cdots \, \vx \, ]$, as required.$\quad\Box$

Theorem 4.34: Let $P$ be an $n \times n$ stochastic matrix all of whose entries are positive, and let $\vx_0$ be any initial probability vector. Then as $k \to \infty$, $\vx_k \to \vx$, where $\vx$ is the steady state probability vector for $P$.

Proof: Suppose that $\vx_0$ has components $x_1, x_2, \ldots, x_n$. Then $$ \begin{aligned} \lim_{k \to \infty} \vx_k &= \lim_{k \to \infty} P^k \vx_0 = L \vx_0 \\ &= [\, \vx \, \vx \, \cdots \, \vx \, ] \vx_0 \\ &= x_1 \vx + x_2 \vx + \cdots + x_n \vx \\ &= (x_1 + x_2 + \cdots + x_n ) \vx = \vx \quad\Box \end{aligned} $$

This result works both ways: if you compute the eigenvector with eigenvalue 1, that tells you the steady-state vector that other states go to as $k \to \infty$. But it also means that if you don't know the steady-state vector, you can approximate it by starting with any vector $\vx_0$ and computing $P^k \vx_0$ for large $k$!

Note: A transition matrix $P$ is regular if some power $P^k$ of it has positive entries. This means that there is a nonzero probably to get from any starting state to any ending state after some number of steps. The text proves the above results for regular matrices, which is not hard once you know them for matrices with positive entries.

Question: Let $P = \bmat{rr} 1/2 & 1/3 \\ 1/2 & 2/3 \emat$. (a) Find a clever way to figure out the eigenvalues of $P$ and determine whether it is diagonalizable.

Solution to (a): Since $P$ is stochastic, we know that $1$ is one of its eigenvalues. That means that the characteristic polynomial factors as $(\lambda - 1)(\lambda - a)$ for some number $a$. Now $\det(P) = 1/3 - 1/6 = 1/6$. On the other hand, $\det(P)$ is the value of the characteristic polynomial when $\lambda = 0$, so we must have $a = 1/6$. Therefore the eigenvalues of $P$ are $1$ and $1/6$. Since there are two distinct eigenvalues, $P$ is diagonalizable.

One can show that eigenvectors of $P$ are $\coll 2 3$ and $\coll {-1} 1$ for $\lambda = 1$ and $1/6$ respectively. Therefore $$ \kern-4ex \bmat{rr} 2 & -1 \\ 3 & 1 \emat^{-1} P \bmat{rr} 2 & -1 \\ 3 & 1 \emat = \bmat{cc} 1 & 0 \\ 0 & 1/6 \emat $$ So $$ \kern-4ex P = \bmat{rr} 2 & -1 \\ 3 & 1 \emat \bmat{cc} 1 & 0 \\ 0 & 1/6 \emat \bmat{rr} 2 & -1 \\ 3 & 1 \emat^{-1} $$ and therefore $$ \kern-8ex \begin{aligned} P^{10} \vx_0\ &= \bmat{rr} 2 & -1 \\ 3 & 1 \emat \bmat{cc} 1^{10} & 0 \\ 0 & (1/6)^{10} \emat \bmat{rr} 2 & -1 \\ 3 & 1 \emat^{-1} \coll {1/2} {1/2} \\ &= \bmat{rr} 2 & -1 \\ 3 & 1 \emat \bmat{cc} 1 & 0 \\ 0 & 1/6^{10} \emat \frac{1}{5} \bmat{rr} 1 & 1 \\ -3 & 2 \emat \coll {1/2} {1/2} \\ % ** Observe that D roughly has 1's and 0's on diagonal. &= \frac{1}{5} \bmat{rr} 2+3/6^{10} & 2-2/6^{10} \\ 3-3/6^{10} & 3+2/6^{10} \emat \coll {1/2} {1/2} \\ % ** Observe that P^k is stochastic, and columns are almost equal. % -- Alternate route: easier arithmetic, but doesn't illustrate the theorem: % &= \frac{1}{5} \bmat{rr} 2 & -1/6^{10} \\ 3 & 1/6^{10} \emat % \ccoll 1 {-1/2} \\ &= \frac{1}{5} \coll {2 + \frac{1}{2(6^{10})}} {3 - \frac{1}{2(6^{10})}} % ** Observe that result is a probability vector. \end{aligned} $$

Section 5.1: Orthogonality in $\R^n$:

Definition: A set $\{ \vv_1, \vv_2, \ldots, \vv_k \}$ of vectors in $\R^n$ is an orthogonal set if all pairs of distinct vectors in the set are orthogonal, i.e. $\vv_i \cdot \vv_j = 0$ whenever $i \neq j$.

Example 5.1: The following vectors form an orthogonal set: $$ \vv_1 = \colll 2 1 {-1} ,\ \vv_2 = \colll 0 1 1,\ \vv_3 = \colll 1 {-1} 1 $$

Theorem 5.1: If $\{ \vv_1, \vv_2, \ldots, \vv_k \}$ is an orthogonal set of nonzero vectors in $\R^n$, then these vectors are linearly independent.

Suppose that $$ c_1 \vv_1 + c_2 \vv_2 + \cdots + c_k \vv_k = \vec 0 $$ If we dot both sides with $\vv_1$, we get that $$ (c_1 \vv_1 + c_2 \vv_2 + \cdots + c_k \vv_k) \cdot \vv_1 = \vec 0 \cdot \vv_1 = 0 $$ But the left-hand-side is equal to $$ c_1 (\vv_1 \cdot \vv_1) + c_2 (\vv_2 \cdot \vv_1) + \cdots + c_k (\vv_k \cdot \vv_1) $$ and all terms are zero except the first. This shows that $$ c_1 (\vv_1 \cdot \vv_1) = 0 $$ Since $\vv_1 \neq 0$, we must have $c_1 = 0$. By dotting with other vectors, we see that all $c_i$s are zero.$\quad\Box$

Definition: An orthogonal basis for a subspace $W$ of $\R^n$ is a basis that is also orthogonal.

For example, the vectors in Example 5.1 automatically form a basis for $\R^3$. Checking orthogonality is much easier than checking linear independence!

Fact: We'll show in Section 5.3 that every subspace has an orthogonal basis.

It is very easy to figure out coordinates with respect to an orthogonal basis. Suppose $\{ \vv_1, \vv_2, \ldots, \vv_k \}$ is an orthogonal basis and $$ \vw = c_1 \vv_1 + \cdots + c_k \vv_k $$ Then if we dot both sides with $\vv_1$, we find $$ \vw \cdot \vv_1 = c_1 (\vv_1 \cdot \vv_1) $$ and so $$ c_1 = \frac{\vw \cdot \vv_1}{\vv_1 \cdot \vv_1} $$

Theorem 5.2: Let $\{ \vv_1, \vv_2, \ldots, \vv_k \}$ be an orthogonal basis for a subspace $W$ of $\R^n$, and let $\vw$ be any vector in $W$. Then the unique scalars $c_1, c_2, \ldots, c_k$ such that $$ \vw = c_1 \vv_1 + \cdots + c_k \vv_k $$ are given by the formula $$ c_i = \frac{\vw \cdot \vv_i}{\vv_i \cdot \vv_i} $$

Example: Find the coordinates of $\vw = \colll 1 2 3$ with respect to the basis $$ \vv_1 = \colll 2 1 {-1} ,\ \vv_2 = \colll 0 1 1,\ \vv_3 = \colll 1 {-1} 1 $$ Solution: The coordinates are $$ \kern-9ex c_1 = \frac{\vw \cdot \vv_1}{\vv_1 \cdot \vv_1} = \frac{1}{6}, \quad c_2 = \frac{\vw \cdot \vv_2}{\vv_2 \cdot \vv_2} = \frac{5}{2}, \quad c_3 = \frac{\vw \cdot \vv_3}{\vv_3 \cdot \vv_3} = \frac{2}{3} $$ So $$ \vw = \frac{1}{6} \vv_1 + \frac{5}{2} \vv_2 + \frac{2}{3} \vv_3 $$ Normally you have to solve a system to figure out the coordinates!

Definition: An orthonormal set is an orthogonal set of unit vectors. An orthonormal basis for a subspace $W$ is a basis for $W$ that is an orthonormal set.

The condition of being orthonormal can be expressed as $$ \vv_i \cdot \vv_j = \begin{cases} 0 & \text{if } i \neq j \\ 1 & \text{if } i = j \end{cases} $$

Example: The standard basis $\{ \ve_1, \ldots, \ve_n \}$ is an orthonormal basis for $\R^n$.

Theorem 5.3: If $\{ \vq_1, \vq_2, \ldots, \vq_k \}$ is an orthonormal basis of a subspace $W$, and $\vw$ is in $W$, then $$ \vw = (\vw \cdot \vq_1) \vq_1 + \cdots + (\vw \cdot \vq_k) \vq_k $$

Note that any orthogonal basis can be converted to an orthonormal basis by dividing each vector by its length. E.g. $$ \kern-6ex \vq_1 = \frac{1}{\sqrt{6}} \colll 2 1 {-1},\quad \vq_2 = \frac{1}{\sqrt{2}} \colll 0 1 1,\quad \vq_3 = \frac{1}{\sqrt{3}} \colll 1 {-1} 1 $$ is an orthonormal basis for $\R^3$.

Orthogonal Matrices

Definition: A square matrix $Q$ with real entries whose columns form an orthonormal set is called an orthogonal matrix. (Strange name!)

Examples: $A = \bmat{rrr} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \emat$ and $B = \bmat{rr} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \emat$ and $C = [ \, \vq_1 \, \vq_2 \, \vq_3 \, ] = \bmat{rrr} \frac{2}{\!\sqrt{6}} & 0 \,\, & \frac{1}{\!\sqrt{3}} \\ \frac{1}{\!\sqrt{6}} & \ph\frac{1}{\!\sqrt{2}} & -\frac{1}{\!\sqrt{3}} \\ -\frac{1}{\!\sqrt{6}} & \frac{1}{\!\sqrt{2}} & \frac{1}{\!\sqrt{3}} \\ \emat^\strut$.

Note: In $\R^2$ and $\R^3$, orthogonal matrices correspond exactly to rotations and reflections. This is an important geometric reason to study them. Another reason is that we will see in Section 5.4 that they are related to diagonalization of symmetric matrices.

Theorems 5.4 and 5.5: $Q$ is orthogonal if and only if $Q^T Q = I$, i.e. if and only if $Q$ is invertible and $Q^{-1} = Q^T$.

The $ij$ entry of $Q^T Q$ is $\row_i(Q^T) \cdot \col_j(Q)$. That is, it equals $\col_i(Q) \cdot \col_j(Q)$. So the columns are orthonormal if and only if the $ij$ entry of $Q^T Q$ is $1$ when $i = j$ and $0$ otherwise, which means that $Q^T Q = I$.

Since $Q$ is square, the last equation is the same as saying the $Q$ is invertible and $Q^{-1} = Q^T$.$\quad\Box$

Theorem 5.7: If $Q$ is orthogonal, then its rows form an orthonormal set too.

Proof: Since $Q^T Q = I$, we must also have $Q \, Q^T = I$. But the last equation says exactly that the rows of $Q$ are orthonormal.$\quad\Box$

Another way to put it is that $Q^T$ is also an orthogonal matrix. Look at the examples again.

Theorem 5.6: Let $Q$ be an $n \times n$ matrix. Then the following statements are equivalent:
a. $Q$ is orthogonal.
b. $\|Q \vx\| = \|\vx\|$ for every $\vx$ in $\R^n$.
c. $Q\vx \cdot Q\vy = \vx \cdot \vy$ for every $\vx$ and $\vy$ in $\R^n$.

(a) $\implies$ (c): Suppose $Q$ is orthogonal, so $Q^T Q = I$. Then $$ \kern-6ex Q\vx \cdot Q\vy = (Q \vx)^T (Q \vy) = \vx^T Q^T Q \vy = \vx^T \vy = \vx \cdot \vy $$ (c) $\implies$ (a): Suppose (c) holds. Take $\vx = \ve_i$ and $\vy = \ve_j$. Then $Q\vx = Q\ve_i$ is the $i$th column of $Q$ and $Q\vy = Q\ve_j$ is the $j$th column of $Q$. (c) says that the dot product of these columns is $$ Q\ve_i \cdot Q\ve_j = \ve_i \cdot \ve_j = \begin{cases} 0 & \text{if } i \neq j \\ 1 & \text{if } i = j \end{cases} $$ So the columns of $Q$ are orthonomal, which means $Q$ is orthogonal.

(b) $\implies$ (c): see text. $\quad\Box$