SWILA Notes.pdf

Viewer
Transcript

SWILA NOTES BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Contents Schedule of topics Preface Acknowledgements 0. Notation 1. Vector spaces: C’mon, it’s just algebra (except not quite) 1.1. Fields 1.2. Vector spaces 1.3. Linear independence 1.4. Bases and coordinates 2. Linear transformations: More than meets the I 2.1. Linear transformations 2.2. Linear transformations T : Fm → Fl as matrices 2.3. The matrix representation of a linear transformation 2.4. Change of coordinate matrices 2.5. The trace of a matrix 2.6. Dimension and isomorphism 2.7. Row reduction and the general linear group: Shearing is caring 3. Vector spaces: But wait, there’s more! 3.1. Subspaces 3.2. Direct sums 3.3. Linear maps and subspaces 3.4. Quotient spaces 4. A minor determinant of your success 4.1. Determinant of a matrix 4.2. Computing the determinant of small matrices 4.3. Geometric interpretation and the special linear group 5. Eigenstuff: of maximum importance 5.1. Facts about polynomials 5.2. Eigenvalues 5.3. The minimal polynomial 5.4. Diagonalizability 5.5. Cyclic subspaces 5.6. Rational canonical form 5.7. The Jordan canonical form 5.8. Computing the Jordan canonical form 6. Inner product spaces? ... More like winner product spaces! 1

3 4 5 6 7 7 8 10 10 13 13 14 15 16 18 19 22 28 28 29 30 31 33 33 35 37 39 39 40 42 44 46 50 54 57 60

2

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

6.1. Inner products 6.2. Angles and orthogonality in inner product spaces 6.3. Orthonormal bases for finite-dimensional inner product spaces 6.4. Orthogonal complements and projections 6.5. The adjoint of a linear transformation 7. Linear operators on IPS: The love story of a couple linear transformations who respected their own personal (inner product) space 7.1. Self-adjoint maps 7.2. Isometries 7.3. The orthogonal and unitary groups 7.4. The spectral theorem for self-adjoint operators 7.5. Normal operators and the spectral theorem for normal operators 7.6. Schur’s theorem 7.7. Singular value decomposition 8. A bird? A plane?... No, it’s a chapter about multilinear algebra. 8.1. Dual spaces 8.2. Tensor products 8.3. Quadratic forms References

60 63 64 67 68 72 72 73 75 76 78 81 82 85 85 86 90 94

SWILA NOTES

3

Schedule of topics (1) Monday, June 6th: 1. Vector spaces: C’mon, it’s just algebra (except not quite); 1.1. Fields; 1.2. Vector spaces; 1.3. Linear independence; 1.4. Bases and coordinates; 2. Linear transformations: More than meets the I; 2.1. Linear transformations; 2.2. Linear transformations T : Fm → Fl as matrices. (2) Thursday, June 9th: 2.3. The matrix representation of a linear transformation; 2.4. Change of coordinate matrices; 2.5. The trace of a matrix; 2.6. Dimension and isomorphism. (3) Monday, June 13th: 2.7. Row reduction and the general linear group; 3. Vector spaces: But wait, there’s more!; 3.1. Subspaces; 3.2. Direct sums; 3.3. Linear maps and subspaces. (4) Thursday, June 16th: 3.4. Quotient spaces; 4. A minor determinant of your success; 4.1. Determinant of a matrix; 4.2. Computing the determinant of small matrices; 4.3. Geometric interpretation and the special linear group; 5. Eigenstuff: of maximum importance; 5.1. Facts about polynomials. (5) Monday, June 20th: 5.2. Eigenvalues; 5.3. The minimal polynomial; 5.4. Diagonalizability. (6) Thursday, June 23rd: 5.5. Cyclic subspaces. (7) Monday, June 27th: 5.6. Rational canonical form. (8) Thursday, June 30th: 5.7. The Jordan canonical form; 5.8. Computing the Jordan canonical form. (9) Monday, July 11th: 6. Inner product spaces? ... More like winner product spaces!; 6.1. Inner products; 6.2. Angles and orthogonality in inner product spaces; 6.3. Orthonormal bases for finite-dimensional vector spaces. (10) Thursday, July 14th: 6.4. Orthogonal complements and projections; 6.5. The adjoint of a linear transformation. (11) Monday, July 18th: 7. Linear operators on inner product spaces; 7.1. Self-adjoint maps; 7.2. Isometries; 7.3. The orthogonal and unitary groups. (12) Thursday, July 21st: 7.4. The spectral theorem for self-adjoint operators; 7.5. Normal operators and the spectral theorem for normal operators. (13) Monday, July 25th: 7.6. Schur’s theorem; 7.7. Singular value decomposition. (14) Thursday, July 28th: 8. Multilinear algebra; 8.1. Dual spaces; 8.2. Tensor products.

4

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Preface Perhaps an accurate title for these notes would be ATLANTIC. I know what you’re thinking: The Atlantic Ocean and the subject of linear algebra seem so different from one other. Yet, I see many similarities between them. The Atlantic Ocean seems so far away from us and is one of the top five least favorite oceans for most. Similarly, there isn’t a graduate course in linear algebra, and it is often dismissed by many professors as assumed. They both play vital roles in terms of connecting us with Europe, with the Atlantic housing vital trade routes and linear algebra having its roots there. Linear algebra may seem like a monstrous body to trek, but so did once the Atlantic. The Atlantic used to take weeks to raft across, but it now can be flown over in a few hours. Similarly, linear algebra is often taught over a semester, but these notes are intended to teach it in 21 hours. The goal of this workshop is not to breeze over the material as perhaps past courses and flights have before it. I have two main goals for this workshop. First, I hope that these notes provide a solid reference for beginning students as they take comps. And second, I hope that participants learn All The Linear Algebra Needed Today In Comps. In early January, I wrote a survey that asked “Please check those topics you would have liked to have learned better prior to entering grad school” and listed many different topics in linear algebra. 16 Illinois grad students responded to this survey, checking off what they thought were the most important topics in linear algebra. The content of my notes and the length of time I spend discussing each topic are based on the results of this survey. These notes are mainly adapted from Petersen’s notes [6]. I wrote Chapters 4 and 5 on spring break and I wrote the rest of the chapters between 9 pm and 2 am, so there will assuredly be typos, for which I apologize in advance. I would like to emphasize I’m not personally getting paid for running this workshop, so please don’t sue me Professor Petersen. Also, I am responsible for any typos and mathematical errors in these notes. The main purpose of Chapters 1-5 is to prove the Jordan canonical form. Chapters 67 set up the proof of the spectral theorem for normal operators and the singular value decomposition of linear operators on finite-dimensional vector spaces. Chapter 8 is written to give an introduction to tensor products. These are the main results of these notes. There are a few notable omissions in these notes, mostly analytic. The reader will notice a disappointing lack of discussion of infinite-dimensional vector spaces, Hilbert spaces, metric spaces, and normed spaces. These cuts were necessary in my opinion in order to get to the Jordan canonical form and the spectral theorem for normal operators. I will try to include some of these topics as exercises in problem sets. In the future, I would like to edit these notes in a few ways: • Include an index. • Include more computational examples. • Include more real-world examples. I suppose I have several goals for this workshop. At the very least, I hope that enough people participate so that I can run it. As I will not be able to TA for the next two years, I hope that this workshop will help me decide whether I want to be a teaching-oriented professor after I graduate. It would be great if this workshop went well enough that I could run it in future years at Illinois. I hope (and expect) that the materials I create for this workshop can be easily reused by others. My loftiest goal is that this workshop becomes so popular that I can instruct other universities how to run it for their graduate students.

SWILA NOTES

5

Acknowledgements I thank Professor Peter Petersen for writing his notes on Linear Algebra [6]; my notes are mainly based on his and it would have been very difficult to write my notes without them. I thank Fuzhen Zhang; his book Linear algebra: challenging problems for students [8] was the primary source of problems for the problem sets. I would like to thank the U.S. Department of Education for supporting me through the GAAAN Fellowship (#P200A150319) the spring semester and summer of 2016; it would have been very difficult to write these notes and run this workshop without it, and both certainly would have been done at a lower quality. I would like to thank Rick Laugesen, Karen Mortenson and the Illinois math department for their tremendous support in setting up the workshop and for finding a way to provide the workshop with money for refreshments. I would like to thank my advisor Professor Jeremy Tyson for his wonderful Latex template and his great support over the past couple years. There are many of my peers I would also like to thank. I would like to thank 16 anonymous graduate students who volunteered to fill out my survey; the surveys were important in deciding which topics to include in my notes and how long to spend on them in the workshop. Thanks to Anthony Sanchez for suggesting I type my notes if I really want to do them well; my handwriting isn’t great and it would have been impossible to edit and distribute my notes. I thank Chris Gartland for suggesting I include a section about Schur’s Lemma. I thank Ruth Luo for helping edit the workshop’s web page, driving me to buy refreshments for the workshop, helping organize dinners, and helping me write workshop emails and surveys. I thank (in alphabetical order) Elizabeth Field, Emily Heath, Alyssa Loving, Ruth Luo, Hadrian Quan, and Simone Sisneros-Thiery for having many conversations with me about how to organize the workshop. I thank Alyssa Loving for helping me organize every aspect of the workshop and always being willing to discuss things with me. I thank Justin Burner, Daniel Carmody, Alyssa Loving, and Ruth Luo for finding typos and giving suggestions on the notes. In addition, I thank for their valuable support: Hannah Burson, Cassie Christenson, Raemeon Cowan, Ravi Donepudi, Martino Fasina, Ian Ford, Elliot Kaplan, Katherine Koch, Christopher Linden, Sarah Loeb, Marissa Loving, Sarah Mousley, Itziar Ochoa de Alaiza, Tsutomu Okano, Matej Penciak, Nigel Pynn-Coates, Vanessa Rivera-Qui˜ nones, Matthew Romney, Elizabeth Tatum, Lan Wang, Joshua Wen, and Dara Zirlin. I would like to thank my parents for always supporting me over the years in everything I pursued. Thanks to Stephen Curry, two-time MVP of the National Basketball Association, and the Golden State Warriors for just being awesome. Finally, I would like to thank the workshop’s participants, of which I feel blessed there are so many and without which the workshop could not run: Roger Burt, Ravi Donepudi, Alexi Block Gorman, Elliot Kaplan, Derek Kielty, Katherine Koch, Jen Li, Alyssa Kealohi Loving, Ruth Luo, Daniyar Omarov, Hadrian Quan, Colleen Robichaux, Khoa Tran, Dennis Wu, and Laila Zhexembay. Thank you all!

6

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

0. Notation We will use the following notation for the entirety of the workshop: (1) F will always denote a field, typically R or C. (2) F[t] denotes the space of polynomials with coefficients in F. (3) I sometimes use interchangeably ⊂ and ⊆ for set containment. (4) Given k ∈ N, Pk (F) denotes the collection of polynomials with coefficients in F and degree at most k. (5) N = {1, 2, 3, . . .} will denote the set of natural numbers. (6) Ml×m (F) denotes the set of l × m matrices with entries in F. (7) Mn (F) := Mn×n (F) denotes the set of n × n matrices with entries in F. (8) (aij )1≤i≤l 1≤j≤m ∈ Ml×m (F) will denote an l × m matrix for which aij ∈ F is the entry in the ith row and the j th column.   a11 a12 · · · a1m  a21 a22 · · · a2m    1≤i≤l (aij )1≤j≤m =  . ..  .. . . .  . . .  . al1

(9) (10) (11) (12) (13) (14) (15) (16) (17) (18)

al2 · · ·

alm

I will typically omit the subscript and superscript of the matrix when they are implied by the context and simply write (aij ). For a linear transformation T : V → W , we write 0 6= T or T 6= 0 to state that T is not the zero linear transformation, i.e., there exists x ∈ V such that T (x) 6= 0. Given vector spaces V , W over a field F, L(V, W ) denotes the set of linear transformations from V to W . I will use different notations for identity transformations and identity matrices. A few of these include: 1, I, and Id. For a polynomial p(t) ∈ F[t], we write 0 6= p(t) or p(t) 6= 0 to state that p(t) is not the zero polynomial. Vector spaces will be assumed to be possibly infinite-dimensional unless noted otherwise. For A = (aij ) ∈ Ml×m (R), the transpose At ∈ Mm×l (R) is defined by At := (aji ). For A = (aij ) ∈ Ml×m (C), the adjoint A∗ ∈ Mm×l (C) is defined by A∗ := (aji ). The adjoint of a linear transformation T : V → W between finite-dimensional vector spaces will be denoted by T ∗ . Given a linear operator T : V → V and x ∈ V , the cyclic subspace generated by x will be denoted by Cx . Ap will denote the companion matrix of a monic polynomial p(t).

SWILA NOTES

7

1. Vector spaces: C’mon, it’s just algebra (except not quite) You can go on all sorts of adventures with Legos. You can sail the seas as a pirate, go back millions of years to walk with the dinosaurs, and blast into outer space with a rocket ship. And there’s all sorts of new pieces now, too. There’s the little one-by-one nub block, a plastic staff, and wigs for your little Lego people. But there’s one main piece that everything is built upon: the traditional two-by-four block. In the first two chapters, we will study vector spaces and linear transformations. These are the building blocks of everything we will study in this workshop, the two-by-four block of linear algebra. Just remember: Everything is awesome, everything is cool when you work as a team. 1.1. Fields. Definition 1.1.1. A field F is a set with two binary operations +, · : F × F → F and distinct elements 0 6= 1 satisfying: For all a, b, c ∈ F, (1) Associativity of +: a + (b + c) = (a + b) + c. (2) Commutativity of +: a + b = b + a. (3) Existence of 0: a + 0 = a. (4) Additive inverses: There exists (−a) ∈ F such that a + (−a) = 0. (5) Associativity of ·: a · (b · c) = (a · b) · c. (6) Commutativity of ·: a · b = b · a. (7) Existence of 1: a · 1 = a. (8) Multiplicative inverses of nonzero elements: If a 6= 0, there exists a−1 ∈ F such that a · a−1 = 1. (9) The Distributive Law: a(b + c) = ab + ac. More abstractly, • (F, +, 0) forms an abelian group, i.e., + satisfies associativity and commutativitiy, and each element has an inverse; • (F \ {0}, ·, 1) forms an abelian group; and • the distributive law holds. We typically omit · when writing multiplication. √ Example 1.1.2. Examples of fields include R, C, Q, and Q[i] := {a + b −1 : a, b ∈ Q}. Here, R denotes the reals, C denotes the complex numbers, and Q denotes the rationals.

8

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

1.2. Vector spaces. A wonderful property about some sets is that we have a notion of addition and scalar multiplication of elements. For example, we can talk the convergence of a sequence of functions to a limiting function since we can talk about the difference of two functions. Sets for which we have nicely behaving binary operations of addition and scalar multiplication are called vector spaces. Definition 1.2.1. A vector space over a field F is a nonempty set of vectors V with an element 0 ∈ V and operations + : V × V → V and scalar multiplication of vectors satisfying: for all x, y, z ∈ V and α, β ∈ F, (1) Associativity of +: (x + y) + z = x + (y + z). (2) Commutativity of +: x + y = y + x. (3) Existence of 0: x + 0 = x. (4) Existence of additive inverses: There exists −x ∈ V such that x + (−x) = 0. (5) Associativity of scalar multiplication: α(βx) = (αβ)x; (6) Commutativity of scalar multiplication: αx = xα; (7) Multiplication by the unit scalar: 1x = x; (8) The distributive laws: α(x + y) = αx + αy (α + β)x = αx + βx. Note conditions (1)-(4) states that (V, +, 0) forms an abelian group. Random Thought 1.2.2. Kid complaining to parent... Kid: I hate adding! Parent: Hey, have a better additude! Remark 1.2.3. Technically, {0} forms a vector space over any field F. (Check the conditions.) However, throughout these notes, all vector spaces are implicitly assumed to be nonzero, i.e., V 6= {0}. Example 1.2.4. The most important example of a vector space is Fn , where Fn := {(a1 , . . . , an ) : ai ∈ F}. Fn is endowed with the natural vector addition and scalar multiplication.

SWILA NOTES

9

Example 1.2.5. The space of polynomials with F-valued coefficients F[x] forms a vector space over F with the operations: (an xn + · · · + a0 ) + (bn xn + · · · + b0 ) := (an + bn )xn + · · · + (a0 + b0 ) c(an xn + · · · + a0 ) = (can )xn + · · · + ca0 . A useful tool when doing computations with multiple polynomials is assuming they all have the same number of terms by defining some higher coefficients to be zero, if necessary. Example 1.2.6. We say that a function f : R → R is smooth if its k th derivative f (k) (x) exists for all k ∈ N and x ∈ R. The space of real-valued smooth functions C ∞ (R) forms a vector space over R with addition and scalar multiplication defined by: for f, g ∈ C ∞ (R), α ∈ R, (f + g)(x) := f (x) + g(x) (αf )(x) := αf (x). Similarly, the space of real-valued continuous functions C(R) forms a vector space over R with similarly defined addition and scalar multiplication. Example 1.2.7. Given two vector spaces V and W , we can form the product V × W := {(v, w) : v ∈ V

and w ∈ W }.

This becomes a vector space when endowed with the operations of coordinate-wise addition and scalar multiplication: (v1 , w1 ) + (v2 , w2 ) := (v1 + v2 , w1 + w2 ) α(v, w) := (αv, αw). Note that this is a slight abuse of notation as we are using + for addition in three different spaces, and similarly with scalar multiplication. This abuse of notation will be typical throughout these notes to save ink, and it is expected that the reader uses the context to sift through the abuse. Vector spaces share a number of intutitve properties involving their additive and multiplicative identity elements: Proposition 1.2.8. For all x ∈ V , (1) 0x = 0. (2) α0 = 0. (3) (−1)x = −x. (4) If αx = 0, then α = 0 or x = 0. Proof. We only prove (4). (1)-(3) are left as exercises to the reader. If α 6= 0, then α has a multiplicative inverse α−1 . We have 0 = α−1 (αx) = 1x = x. As mentioned before, a fundamental property of vector spaces is that we can add scalar multiples of vectors together. Remark 1.2.9. A vector space in which we can also multiply vectors is called an algebra.

10

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Definition 1.2.10. A linear combination of vectors in a vector space V is an element x ∈ V of the form x = α 1 x1 + α 2 x2 + · · · + α n xn , for some finite collection of vectors x1 , . . . , xn ∈ V and scalars α1 , . . . , αn ∈ F. Remark 1.2.11. Note that we can only consider finite sums of vectors in V . In general vector vector spaces, we have no concept of limits, and hence, infinite series. Definition 1.2.12. Fix a subset S ⊆ V . We define span(S) to be the collection of all linear combinations of vectors of S. This forms a subspace of V (closed under vector addition and scalar multiplication), and we say S spans V if span(S) = V . 1.3. Linear independence. Definition 1.3.1. A collection of vectors {vα }α is linearly independent if: a1 vα1 + · · · + an vαn = 0 for ai ∈ F

=⇒

ai = 0 for all i.

A collection of vectors {vα }α is linearly dependent if it is not linearly independent. It isn’t hard to prove the following results. Proposition 1.3.2. (Characterization of linear dependence) A collection of vectors {x1 , . . . , xn } ⊂ V is linearly dependent if and only if either x1 = 0, or we can find a smallest k ≥ 2 such that xk is a linear combination of x1 , . . . , xk−1 . Corollary 1.3.3. If V = span{x1 , . . . , xn }, then there exists a subset {xi1 , . . . , xik } that forms a basis for V . Proposition 1.3.4. (Steinitz Replacement) Let y1 , . . . , ym ∈ V be linearly independent and V = span{x1 , . . . , xn }. Then m ≤ n and V has a basis of the form y1 , . . . , ym , xi1 , . . . , xil , where l ≤ n − m. In particular, we may complete any linearly independent subset of a finitedimensional vector space to a basis. Here, we say that a vector space is finite-dimensional if it has a finite basis. 1.4. Bases and coordinates. Definition 1.4.1. Let V be a vector space over a field F. An F-basis for V is a linearly independent collection of vectors {vα }α∈A that spans V . We will say {vα }α∈A is a finite basis if A is finite. We simply write basis when the underlying field of scalars is clear from context. Equivalently, each vector in V can be uniquely written as a linear combination of vectors in {vα }α∈A , i.e., for each v ∈ V , there exist unique vectors vα1 , . . . , vαn and scalars a1 , . . . , an ∈ F (depending on v) such that v = a1 vα1 + · · · + an vαn . Remark 1.4.2. The zero vector space is typically defined to have dimension 0. Remark 1.4.3. In this workshop, we will typically only work with vector spaces which have finite bases.

SWILA NOTES

11

Example 1.4.4. Define the standard basis vectors e1 , e2 , . . . , en in Fn by e1 := (1, 0, 0, . . . , 0) e2 := (0, 1, 0, . . . , 0) .. . en := (0, . . . , 0, 1). In general ei = (a1 , . . . , an ), where aj = 0 if i 6= j and ai = 1. It is easy to see that {ei }1≤i≤n forms a finite F-basis for Fn . We will define S := {e1 , . . . , en }. Example 1.4.5. The monomials {xn }n∈N ∪ {1} form an F-basis for F[x]. However, these monomials do not form a basis for the vector space of formal power series F[[x]] := {

∞ X

an xn : an ∈ F}.

n=0

(Why not?) Example 1.4.6. For each 1 ≤ i ≤ l and 1 ≤ j ≤ m, define Eij to the (l × m)-matrix for which the (i, j)th entry is 1 and every other entry is 0. The vector space Ml×m (F) of (l × m)-matrices has {Eij }1≤i≤l 1≤j≤m as a basis. Remark 1.4.7. One can show using Zorn’s Lemma that every nonzero vector space has a basis. However, no nonzero vector space over R, Q, or C has a unique basis. (Why?) Remark 1.4.8. Suppose {v1 , . . . , vn } is a basis for a vector space V . If a1 v1 + · · · + an vn = b1 v1 + · · · + bn vn , we see ai = bi for each i. This means we have a correspondence V ↔ Fn via the identification a1 v1 + · · · + an vn ←→ (a1 , . . . , an ). Definition 1.4.9. Suppose V is a vector space over F with an ordered basis B = (v1 , . . . , vn ). Given x = a1 v1 + · · · + an vn ∈ V , we define the coordinates of x with respect to B to be [x]B := (a1 , . . . , an ). Remark 1.4.10. An ordered basis is one in which we ascribe a specific order to the elements of the basis. Note the basis in Definition 1.4.9 needs to be ordered to ensure the coordinates defined with respect to it are well-defined. We will typically assume our bases are ordered. Example 1.4.11. Let S be the standard basis for R2 and let B = {(2, 0), (1, 2)}. Note B is a basis for R2 . As 3e1 + 2e2 = (3, 2) = (2, 0) + (1, 2), we have [(3, 2)]S = (3, 2) [(3, 2)]B = (1, 1).

12

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Example 1.4.12. Let P3 (F) be the collection of polynomials of degree at most 3 with coefficients in F. Note that P3 (F) forms a vector space over F, but not an algebra. (Why?) We define the bases B := {1, x, x2 , x3 } and C := {1, 1 + x, 1 + x + x2 , 1 + x + x2 + x3 } for P3 (F). As x2 + x3 = (1 + x + x2 + x3 ) − (1 + x), we have [x2 + x3 ]B = (0, 0, 1, 1) [x2 + x3 ]C = (0, −1, 0, 1).

SWILA NOTES

13

2. Linear transformations: More than meets the I 2.1. Linear transformations. I love to go to math conferences. Get to meet new people, receive a nice per diem, and travel to a new place. One thing about traveling though, I’m rarely familiar with the town. So, I have to use a map to help navigate the area to avoid being lost. While some people may just see intersecting lines on maps that make up the roads, I see directions to help keep me on the street and narrow. What’s the moral of this aside? Linear maps enable us to explore the spaces in which we study. And that’s what this chapter is about. Throughout these notes, if we mention two vector spaces V and W , we will assume both vector spaces are over a common field F. For a linear transformation T : V → W , we write 0 6= T or T 6= 0 to mean that there exists x ∈ V such that T (x) 6= 0. Definition 2.1.1. Let V , W be vector spaces over a field F. A map T : V → W is said to be a linear transformation if for all x, y ∈ V and α ∈ F, T (αx) = αT x T (x + y) = T x + T y. Equivalently, for all x, y ∈ V and α ∈ F, T (αx + y) = αT x + T y. Definition 2.1.2. A linear transformation T : V → V is called a linear operator. A linear transformation T : V → F is called a linear functional. Example 2.1.3. The identity operator idV : V → V given by v 7→ v, and the zero transformation 0 : V → W given by v 7→ 0W are the typical examples of linear transformations. Example 2.1.4. Define l1 (R) to be the vector space of R-valued sequences that are absolutely summable, i.e., X |ai | < ∞, ai ∈ R}. l1 (R) := {(ai )i∈N : i∈N

The linear transformation S : l1 (R) → l1 (R) induced by ei 7→ ei+1 , i ∈ N, is an example of a linear transformation that is not invertible. One can similarly define a noninvertible linear transformation S˜ : Σc (R) → Σc (R), where Σc (R) := {(ai )i∈N : ai ∈ R, ∃N so that ai = 0 for all i > N } is the vector space of real sequences with compact support. The interesting thing about these transformations is that they are norm-preserving when equipping l1 (R), Σc (R) with the l1 , sup norm, respectively. R Example 2.1.5. Define L2 (R) to be the set of all functions f : R → R satisfying R |f (x)|2 dx < ∞. L2 (R) forms an R-vector space with the natural operations of function addition and scalar R multiplication. Given f ∈ L2 (R), the map Tf : L2 (R) → R defined by g 7→ R f (x) · g(x) dx defines a linear functional. (Note one needs to check the latter integral is finite for each g ∈ L2 (R).)

14

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

2.2. Linear transformations T : Fm → Fl as matrices. Lemma 2.2.1. Any linear map T : Fm → F is of the form T = ( x1

···

xm ),

where xi = L(ei ) ∈ F. Hence, T (a1 e1 + · · · + am em ) = a1 x1 + · · · + am xm . Proof. This follows from the linearity of T .

Remark 2.2.2. (Matrix multiplication as a linear transformation) Fix a matrix A ∈ Ml×m (F). Then T : Fm → Fl defined T (x) = Ax defines a linear transformation. Observe for each i = 1, . . . , m,     a1i a11 a12 · · · a1m  a2i   a21 a22 · · · a2m      (e ) = T (ei ) =  .   ..  . .. .. . . i    .. . .  . . al1

al2 · · ·

alm

ali

(We always apply matrices on the left to column vectors. I typically will not differentiate between row vectors and column vectors:   b1  b2    (b1 , b2 . . . , bn ) ←→  ..   .  bn throughout these notes.) Remark 2.2.2 leads us to the following proposition: Proposition 2.2.3. We have a correspondence between Ml×m (F)

←→

{linear transformations T : Fm → Fl }

given by A

←→

(x 7→ Ax).

Proof. We noted in Remark 2.2.2 that a matrix in Ml×m (F) naturally defines a linear transformation T : Fm → Fl . On the other hand, fix a linear transformation T : Fm → Fl . Define the matrix   | | | A =  T (e1 ) T (e2 ) · · · T (em )  | | | So, for each i = 1, . . . , m, the ith column of A is the vector T (ei ) ∈ Fl . Given (a1 , . . . , am ) ∈ Fm ,   a1   A  ...  = a1 T (e1 ) + · · · + am T (em ) = T (a1 e1 + · · · + am em ). am Thus, T (x) = Ax for all x ∈ Fm .

SWILA NOTES

15

2.3. The matrix representation of a linear transformation. In Proposition 2.2.3, we showed that we can identify linear transformations Fm → Fl with l × m matrices. Given a linear transformation T : Fm → Fl , we constructed a matrix A ∈ Ml×m such that T (x) = Ax for all x ∈ Fm . In particular, if Sm , Sl are the standard bases for Fm , Fl , respectively, [T (x)]Sl = A[x]Sm

for all x ∈ Fm .

The question is whether we can define a similar correspondence with more general linear transformations V → W . More specifically, given a linear transformation T : V → W , we wish to find a matrix that transforms coordinates of a vector x ∈ V into coordinates of T (x) ∈ W . In this section, we will show that we can do this if V and W have finite bases. Fix an ordered basis B = (v1 , . . . , vn ) for a vector space V and a vector x ∈ V . If we write x = a1 v1 + · · · + an vn , recall that we define the coordinates of x with respect to B to be   a1   [x]B =  ...  . an This defines an identification of Fn with V . (See Remark 1.4.8 and Definition 1.4.9.) Recall S = {e1 , . . . , em } is the standard basis for Fm . Given a matrix A ∈ Ml×m (F), note Aei equals the ith column of A. This motivates the following definition: Definition 2.3.1. Let V and W be vector spaces over F. Let B = {e1 , . . . , em } be a basis for V , and C = {f1 , . . . , fn } a basis for W . Fix a linear transformation T : V → W . We define the matrix representation of T with respect to B and C to be the n × m matrix   | | [T ]C,B =  [T (e1 )]C · · · [T (em )]C  . | | Here, for each i = 1, . . . , m, [T (ei )]C is the column vector of the coordinates of T (ei ) with respect to C. In the case V = W and B = C, we simply write [T ]B for [T ]B,B . We will typically be interested in the latter case. More specifically, given a vector space V and a linear operator T : V → V , we will be looking for an ordered basis B such that the matrix representation [T ]B of T for which the B is “nice”. Proposition 2.3.2. Fix vector spaces V , W with bases B = {e1 , . . . , em }, C = {f1 , . . . , fn }, respectively. Then [T (x)]C = [T ]C,B [x]B for all x ∈ V. In other words, the n × m matrix [T ]C,B transforms coordinates of a vector x ∈ V with respect to B into the coordinates of T (x) ∈ W with respect to C. Proof. The idea of this proof is to prove the identity first when x equals of the vi , and then identity should hold for all x by linearity. This proof strongly uses that the coordinate maps [ ]B : V → Fm , [ ]C : W → Fn are linear.

16

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

For each i = 1, . . . , m, [T ]C,B ei equals the ith column of [T ]C,B , which by construction is [T (vi )]C . But note ei = [vi ]B . This implies [T (vi )]C = [T ]C,B [vi ]B . It then follows from linearity of T and the coordinate maps, and the definition of basis, that [T (x)]C = [T ]C,B [x]B

for all x ∈ V.

Example 2.3.3. Let S = {e1 , . . . , en } be the standard ordered basis for Rn . Define the linear transformation T : Rn → Rn by T (a1 , . . . , an ) =

n X

kak .

k=1

As T (ek ) = kek for each k = 1, . . . , n, the matrix representation of T with respect to S is given by the diagonal n × n matrix   1 0 ··· 0  0 2 0    [T ]S =  . ..  . . . .  . . .  0 0 ···

n

We can use Proposition 2.3.2 to prove the following theorem: Theorem 2.3.4. Fix vector spaces V, W, X with finite bases A, B, C, respectively. If S : V → W , T : W → X are linear transformations, then [T ◦ S]C,A = [T ]C,B [S]B,A . Proof. (Sketch) Assume A has m elements and C has n elements. Then [T ◦ S]C,A is an n × m matrix, so [T ◦ S]C,A may be viewed as a linear transformation from Fm to Fn by Proposition 2.2.3. On the hand, [T ]C,B [S]B,A may also be viewed as a linear transformation from Fm → Fn . It follows from linearity that if [T ◦ S]C,A (ei ) = [T ]C,B [S]B,A (ei ) for all 1 ≤ i ≤ m, then then they agree on all of Fm . This fact follows from the formula written in Proposition 2.3.2. 2.4. Change of coordinate matrices. In this section, we will introduce change of coordinate matrices and obtain equivalent matrix representations of linear transformations. Definition 2.4.1. Let B = {e1 , . . . , en }, C = {f1 , . . . , fm } be two ordered bases for a vector space V . We define the change of coordinate matrix of B to C to be   | | | [Id : V → V ]C,B =  [e1 ]C [e2 ]C · · · [en ]C  | | | For all v ∈ V , we have [v]C = [Id]C,B [v]B . In other words, [Id]C,B transforms (by left multiplication) the coordinates of a vector v in B to the coordinates of v in C.

SWILA NOTES

17

Remark 2.4.2. We will show later in Theorem 2.6.16 that if V has a finite basis B, then any other basis has the same number of vectors as B. Applying Theorem 2.3.4 to the identity T = Id ◦ T ◦ Id, we have the following theorem: Theorem 2.4.3. Let B, C be finite bases of a vector space V . If T : V → V is a linear operator, then [T ]C = [Id]C,B [T ]B [Id]B,C . We will use this theorem and the following corollary throughout. Let S = {e1 , . . . , en } be the standard basis for Fn . From Proposition 2.2.3, recall that a linear operator T : Fn → Fn is of the form T (x) = AT x, where the ith column of AT is T (ei ). Corollary 2.4.4. Fix a basis B = {f1 , . . . , fn } for Fn . If T : Fn → Fn , T (x) = Ax, is a linear operator, then S −1 = [Id]B,S and A = S[T ]B S −1 , where S is the n × n matrix with its ith column being fi . Proof. Observe 

S = [Id]S,B

|  f1 · · · = |

 | fn  . |

As we have S −1

IdMn (F) = [Id]S,S = [Id]S,B [Id]B,S , = [Id]B,S . The corollary then follows from the previous theorem.

Example 2.4.5. Let  1 1 ··· 1  0 1 ··· 1    A= . ..  ∈ Mn (F). . . .  . . .  0 0 ··· 1 We can use the previous corollary to easily find the inverse of A. Define      1 1 1  1   1  0            v1 =  0  , v2 =  0  , . . . , vn =  1  ..   ..  ..   .  .   .  

0

0

    .  

1

Then B := {v1 , . . . , vn } can be seen by induction to be a basis for Fn . Let S = {e1 , . . . , en } be the standard basis for Fn . Observe A = [Id]S,B . By the previous corollary, 

A−1 = [Id]B,S

| =  [e1 ]B · · · |

|



[en ]B  . |

18

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Then note e1 = v1 and ei = vi − vi−1 for  1   0  −1 A =  ...   0 0

each i ≥ 2. We may conclude  −1 0 · · · 0 ..  1 −1 .   . .. .. . 0  .  0 1 −1  0 ··· 0 1

We conclude this section with the following definition. Definition 2.4.6. We say that a matrix A ∈ Mn (F) is similar to a matrix B ∈ Mn (F) if there exists an invertible n × n matrix S such that A = SBS −1 . From Corollary 2.4.4, we see that two squares matrices are similar if and only if they represent the same linear operator on Fn , but with respect to possibly different bases. 2.5. The trace of a matrix. Definition 2.5.1. We define the trace tr : Mn (F) → F by tr(aij ) := a11 + a22 + · · · + ann . Note that tr is a linear transformation. Note if F = R and A ∈ Mn (R), tr(At ) = tr(A). If F = C and B ∈ Mn (C), tr(A∗ ) = tr(A). A tedious computation gives us the following lemma (without proof here): Lemma 2.5.2. (Invariance of trace) If A ∈ Ml×m (F) and B ∈ Mm×l (F), then tr(BA) = tr(AB). Remark 2.5.3. Observe

1 1 0 0

0 0 0 1

=

0 1 0 0

0 0 1 1 0 0 = . 0 1 0 0 0 0 This is an example of matrices A, B for which tr(AB) = tr(BA), even though AB 6= BA. That’s what makes the previous lemma so remarkable: despite the fact that Mn (F) is noncommutative in general, the trace functional behaves as it were. For fun, consider what these matrices do as operators on R2 to figure out why exactly one product represents the zero transformation. Remark 2.5.4. Lemma 2.5.2 states that tr(AB) = tr(BA) for matrices A, B of appropriate sizes. Equivalently, the trace linear functional is invariant under cyclic permutations of matrices. However, it would be false to state that the trace totally ignores the noncommutativity of Mn (F). For example, let A, B ∈ M2 (R) be given by 1 2 2 3 A= , B= . 3 4 4 5

SWILA NOTES

19

One can compute AABB =

392 517 856 1129

and ABAB =

386 507 858 1127

.

Hence, tr(AABB) = 1521 6= 1513 = tr(ABAB). We have considered the trace of square matrices. We will now show that we can define the trace of linear operators on vector spaces with finite bases. Proposition 2.5.5. Let T : V → V be a linear operator. If B, C are two finite bases of V , then tr([T ]B ) = tr([T ]C ). Proof. By Theorem 2.4.3, [T ]B = [Id]B,C [T ]C [Id]C,B . By Lemma 2.5.2, tr([T ]B ) = tr (([Id]B,C [T ]C )[Id]C,B ) = tr ([Id]C,B ([Id]B,C [T ]C )) = tr ([T ]C ) . We used that [Id]C,B [Id]B,C = [Id]C . This enables us to make the following definition: Definition 2.5.6. Let T : V → V be a linear operator. If V has a finite basis B, we define tr(T ) := tr([T ]B ). Note that the previous proposition ensures that is well-defined. More precisely, it doesn’t depend on the choice of the finite basis since taking another matrix representation of T would result in the same trace. Remark 2.5.7. In the previous proposition, note we have not shown that B necessarily has the same number of elements as C. This is the content of the next section. We will later show that the trace defines an inner product on Mn (C) and Mn (R) (see Example 6.1.5 and Remark 7.4.8). 2.6. Dimension and isomorphism. Remark 2.6.1. Recall that a map f : V → W is surjective (or onto) if f (V ) = W , and is injective (or one-to-one) if f (v1 ) = f (v2 )

⇒

v1 = v2 .

We say that a map f : V → W is bijective if it is injective and surjective. Observe that a linear transformation T : V → W is injective if and only if ker(T ) := {v ∈ V : T (v) = 0} = {0}. We will use this observation throughout these notes.

20

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Random Thought 2.6.2. Oh dear, John, by Nicholas Sparks In the best seller Dear John, we journeyed with Savannah as she fell in love with a soldier John and experienced the death of her husband Tom. Now married to John with kids, Savannah must overcome yet another challenge: Dealing with John’s lame jokes. John: Hey Savannah Banana. Guess what? Savannah: What John? John: Why did the soldier no longer see one-to-one with his pompous, recently promoted general? Savannah: Why John?... John: Because he no longer had a trivial colonel. Savannah: Oh dear, John. Definition 2.6.3. A linear transformation T : V → W is an isomorphism if there exists a linear transformation S : W → V such that T S = 1W and ST = 1V . We say vector spaces V and W are isomorphic if there exists an isomorphism T : V → W . Here, 1V , 1W are the identity transformations on V, W , respectively. Routine algebraic manipulations give us the following lemma (without proof): Lemma 2.6.4. A linear transformation T : V → W is an isomorphism if and only if it is bijective. Remark 2.6.5. This lemma tells us that a map between vector spaces being linear gives us possibly additional structure on the map. This theme comes up commonly in math. For example, all functions f : C → C that are once differentiable are actually infinitely differentiable and have locally convergent power series at each point. This is not true for functions g : R → R. This stems from the fact that one can approach a complex number from a myriad of directions in C while one can only approach a real number in essentially two ways in R. The fact that a linear transformation is additive and allows us to pull out scalars must be a rather strong condition on our functions. Example 2.6.6. Fix a field F, and define Σc (F) to be the collection of all compactly supported F-valued sequences. In other words, for all (ai ) ∈ Σc (F), there exists N such that for all i ≥ N , ai = 0. Observe that Σc (F) has a basis {ei }i∈N , where ei is the sequence for which its ith term is 1 and every other term is 0. Then we have an isomorphism from Σc (F) to the space F[x] of polynomials with coefficients in F, induced by ei 7→ xi−1 and extended linearly (compare with Example 1.4.5). Definition 2.6.7. (Characteristic of a field F) Fix a field F. If n · 1F is nonzero for all n ∈ N, we define the characteristic of F to be 0. Otherwise, we define the characteristic of F to be the smallest n ∈ N such that n · 1F = 0. Remark 2.6.8. The characteristic of a field F is actually the positive generator for the kernel of the map ι : Z → F defined by ι(n) := n · 1F . It’s an exercise left to the awesome reader to show that the characteristic of a field F is either 0 or a prime number. (Hint: Use the Fundamental Theorem of Arithmetic.) Example 2.6.9. Fix a positive prime p. We can define an equivalence relation on Z by proudly declaring a ≡ b if b − a is divisible by p. Given a ∈ Z, the equivalence class of a is

SWILA NOTES

21

[a]p := a + pZ. We define Z/pZ (sometimes written Zp ) to be the collection of equivalence classes of Z with respect to this relation. So, Z/pZ = {[0]p , [1]p , . . . , [p − 1]p }. We can define addition and multiplication on Z/pZ as: [a]p + [b]p := [a + b]p ,

[a]p · [b]p := [ab]p .

These definitions can be shown to be well-defined. This makes Z/pZ into a field of characteristic p. It can be shown that all finite fields are of order pn for some n ∈ N and prime p > 0. Random Thought 2.6.10. Best name for a math acapella group: InVerse. It could even specialize in oldies and its motto could be “Bringing it back!”. Someone please create this for me; it’s about time someone injected some fun onto acapella groups. Theorem 2.6.11. Suppose F is of characteristic 0. If Fm is isomorphic to Fn as vector spaces over F, then m = n. In particular, the result holds when F = Q, R, or C. Proof. This theorem will follow by use of the trace. Suppose we have L : Fm → Fn and K : Fn → Fm such that L ◦ K = 1Fn and K ◦ L = 1Fm . Proposition 2.2.3 showed that we may identify L and K with matrices, and composition with matrix multiplication. Then by Lemma 2.5.2 (for the middle equality), n = tr(1Fn ) = tr(LK) = tr(KL) = tr(1Fm ) = m. Remark 2.6.12. If F is of characteristic p for some prime p, we may have that m = n in Z/pZ while m 6= n in Z. Corollary 2.6.13. (Invariance of Dimension) Suppose F is of characteristic 0. Suppose B = {e1 , . . . , em } and C = {f1 , . . . , fn } are ordered bases for a vector space V . Then m = n. In particular, this corollary holds when F = Q, R, or C. Proof. From Remark 1.4.8 and Definition 1.4.9, we have the coordinates map ιB : V → Fm by v 7→ [v]B is an isomorphism. Similarly, ιC : V → Fn by v 7→ [v]C is an isomorphism. Thus, n m ιB ◦ ι−1 C : F → F is an isomorphism. Theorem 2.6.11 gives that m = n. Remark 2.6.14. Note that Corollary 2.6.13 applies for when F = R or F = C. An interesting question is whether Corollary 2.6.13 still holds if F is of positive characteristic. And the answer is yes, with a slightly different proof that also works in the case of characteristic 0. I found this proof online by Jack Huizenga, a professor at Penn State [2]. We first prove the following lemma: Lemma 2.6.15. Fix a field F. If v1 , . . . , vm are m linearly independent vectors in Fn , then m ≤ n. Proof. Suppose for contradiction m > n. Define the matrix   | | A =  v1 · · · vm  ∈ Mn×m (F). | |

22

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

As m > n, there is a nonzero solution to the equation Ax = 0, call it (α1 , . . . , αm )t . Thus, we have α1 v1 + · · · + αm vm = 0, with at least one of the αi nonzero. This is a contradiction. We can now prove that any two finite bases of V have the same number of elements: Theorem 2.6.16. Let v1 , . . . , vm and w1 , . . . , wn be two bases of a vector space V over F. Then m = n. Proof. Without loss of generality, assume m > n. Since the wi ’s form a basis, we may write vj = a1j w1 + · · · + anj wn ,

j = 1, 2, . . . , m.

The coefficients (a1j , . . . , anj ) of these vectors are m vectors in Fn , so they are linearly independent by the previous lemma. This implies that we may write the coefficient vectors as nontrivial linear combinations of 0. It’s an easy calculation to check that the vectors v1 , . . . , vm must satisfy the same nontrivial linear combination to be 0. We can now end this section by defining the dimension of a vector space: Definition 2.6.17. Let V be a vector space over a field F. If V has a finite basis, we define the dimension of V to be the order of any basis of V and say that V is finite-dimensional. Otherwise, we say that V is infinite-dimensional. 2.7. Row reduction and the general linear group: Shearing is caring. A matrix J decides to change the names of its rows to pears. After some time, J is despondent over the name change and shears with her companion matrix R. R comforts J, “Don’t diss pear. For that which we call rows by any other name is just as sweet.” Suppose that we have a system of equations: a11 x1 + · · · + a1m xm = b1 a21 x1 + · · · + a2m xm = b2 .. .. . . al1 x1 + · · · + alm xm = bl We can rewrite this system as an equation of matrices:     a11 · · · a1m x1  .. . . ..   ..  =   . . .  .   al1 · · · alm xm

 b1 ..  .  bl

For fixed (aij ) and (b1 , . . . , bl ), we wish to find a solution (x1 , . . . , xn ). It will be much easier to find a solution if (aij ) is in a simple form or we know its inverse. That’s the content of this section. Definition 2.7.1. A matrix is in row echelon form if • The first nonzero entry in each row (if any) is normalized to be 1. This is called the pivot (or leading 1 ) for the row. • The leading 1s (if any) appear in echelon form, i.e., as we move down the rows the leading 1s will appear farther to the right. A matrix in row echelon form is said to be in row reduced echelon form if the entries above and below each leading 1 consist of zeros.

SWILA NOTES

23

1 0 1 Example 2.7.2. The matrix is in row echelon form, but not row reduced 0 0 1 1 0 2 echelon form. The matrix is in row reduced echelon form. The matrix 0 1 −3 2 0 0 is not in row echelon form. 1 1 1 To transform a matrix into row reduced echelon form, we can perform the following three row operations: (1) Interchanging rows (2) Adding a multiple of a row to another row (3) Multiplying a row by a nonzero constant Working columns from left to right, one can prove the following proposition: Proposition 2.7.3. Any matrix (possibly nonsquare) can be reduced to row reduced echelon form by a finite number of the above row operations, with the number of pivots equal to the matrix’s rank (the dimension of the image of its columns). We now show that we can associate row operations with so-called elementary matrices. Recall that linear operators on Fm correspond to m × m matrices. In fact, Mm (F) 3 A

←→

(x 7→ Ax) ∈ L(Fm , Fm ).

(See Proposition 2.2.3.) Also recall    | | | A  v1 · · · vn  =  A(v1 ) · · · | | |

 | A(vn )  |

for matrices of appropriate sizes. Finally, observe the marvelous fact that when we are performing a row operation, we are actually performing the same transformation on each of the columns! Fix a matrix A with m rows. We have the following correspondence between row operations, linear operators Fm → Fm , and left multiplication by elementary matrices. Row operation 1. Interchange rows k and l

←→ Linear operator Fm → Fm ←→ left multiplication by E ∈ Mm (F) ←→

Switch xk and xl ←→ Keep same everything else

Ikl

2. Add α × row l ←→ Replace xk with xk + αxl ←→ to row k Keep same everything else

Rkl (α)

3. Multiply row k ←→ Replace xk with αxk ←→ by α 6= 0 Keep same everything else

Mk (α)

24

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

For example, interchanging rows 1 and 2 corresponds to the linear operator Fm → Fm given by     x1 x2  x2   x1       x3   x3  → 7    ,  ..   ..   .   .  xm xm which corresponds to the identity matrix with its first and second rows switched:   0 1 0 ··· 0  1 0 0 ··· 0     0  I12 =  0 0 1 .   .. .. . .   . . . 0 0 ···

0

1

The three types of matrices are written explicitly below. Recall that we define Ekl ∈ Mm×n (F) to be the matrix for which the entry in the k th row and the lth column is 1 and all other entries are 0. Fix A ∈ Mm×n (F). The row operations can be achieved by multiplying A on the left by the elementary matrices: (1) Interchanging rows k and l: This can be achieved by the matrix multiplication Ikl A, where X eii ∈ Mm (F). Ikl = ekl + elk + i6=k,l

(2) Mulitplying row l by α and adding it to row k 6= l: This can be achieved by the matrix multiplication Rkl (α)A, where Rkl (α) = 1Fn + αEkl ∈ Mm (F). (3) Multiplying row k by α ∈ F \ {0}: This can be acheived by Mk (α)A, where X Mk (α) = αEkk + Eii ∈ Mm (F). i6=k

We can similarly define elementary matrices corresponding to column operations. To perform a column operation, one multiplies a matrix on the right by an associated elementary matrix. Remark 2.7.4. When m = n, an easy way to remember the form of these elementary matrices is by applying the above operations to A = Id. Definition 2.7.5. The kernel of a matrix A ∈ Ml×m (F) is defined to be the kernel of T : Fm → Fl , T (x) := Ax. Equivalently, ker(A) = {x ∈ Fm : Ax = 0}. Example 2.7.6. Let

1 0 2 A= ∈ M2×3 (R). 0 −1 3 The kernel of A is the set of all (x, y, z) ∈ R3 such that   x 1 0 2   y = 0. 0 −1 3 z

SWILA NOTES

25

As the rank of A is 2 and A has 3 columns, the Rank-Nullity Theorem implies the kernel of A must be 1 dimensional. We know that any (x, y, z) ∈ ker A satisfies x + 2z = 0 − y + 3z = 0. This implies ker A = {(−2z, 3z, z) : z ∈ R}. Definition 2.7.7. A matrix A ∈ Mn (F) is invertible if there is a matrix A−1 ∈ Mn (F) such that AA−1 = A−1 A = Idn . An interesting fact about invertible matrices is that the row reduced echelon form of an invertible matrix is the identity matrix. This results from the fact that the rank of any matrix equals the number of leading 1’s of its row reduced echleon form. The following proposition is an interesting consequence of Corollary 3.3.4. It will not be used in these notes so as to avoid any circular reasoning. Proposition 2.7.8. (One-sided invertibility implies invertibility) Let A ∈ Mn (F). Then A is invertible if there exists a matrix B ∈ Mn (F) satisfying one of the following conditions: • AB = Idn . • BA = Idn . In either case, B = A−1 . We now give a formal definition of a group as we will be interested in several matrix groups throughout these notes. Definition 2.7.9. A set G with a binary operation · : G × G → G, (a, b) 7→ a · b, is called a group if it satisfies the following three conditions: For all g1 , g2 , g3 ∈ G, • (Associativity) (g1 · g2 ) · g3 = g1 · (g2 · g3 ) • (Existence of an unit) There exists an element e ∈ G such that g1 · e = g1 . • (Existence of inverses) There exists an element g1−1 ∈ G such that g1−1 ·g1 = e = g1 ·g1−1 . We typically omit writing · for the product a · b and simply write ab. Definition 2.7.10. For a group G and a (possibly infinite) subset S ⊂ G, we say that S generates G if each element of G can be written as a finite product of elements of G. Remark 2.7.11. Much like how we only consider finite linear combinations of vectors, we only consider finite products of group elements. The collection of all invertible n × n matrices is called the general linear group and is denoted by: GLn (F) = {A ∈ Mn (F) : A is invertible}. This collection forms a group with the operation of matrix multiplication. Suppose a matrix A ∈ Mn (F) is invertible. Then, we can find the inverse of A by setting up the augmented matrix [A|Id] and applying row operations to reduce A to Id. The inverse of A will be to the right side of the bar. We get [A|Id]

−→

This leads us to the following proposition:

[Id|A−1 ].

26

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Proposition 2.7.12. Each elementary matrix is invertible. On the other hand, any invertible matrix can be written as the product of elementary matrices. In particular, the elementary matrices Ikl , Rkl (α), and Mk (α), α ∈ F, generate GLn (F). Proof. Since each row operation can be reversed and row operations correspond with elemen−1 −1 tary matrices, each elementary matrix is invertible. More explicitily, Ikl = Ikl , Rkl (α) = −1 −1 Rkl (−α), and Mk (α) = Mk (α ). For the other direction, fix an invertible matrix A. By Proposition 2.7.3, there exists a finite sequence of elementary matrices E1 , E2 , . . . , Er such that Er · · · E2 E1 A = Idn . Then A = E1−1 E2−1 · · · Er−1 . We are done since we showed earlier that the inverse of an elementary matrix is an elementary matrix. Remark 2.7.13. This proposition can be useful because a statement about invertible matrices can sometimes be reduced to proving it for the subset of elementary matrices, then applying an inductive argument. For example, one can use this proposition to prove of the most important properties of the determinant (which we will talk about next chapter): For any linear operator T : Rn → Rn and any “nice” subset E of Rn , Volume(T (E)) = det(T ) · Volume(E). Hence, linear operators uniformly scale the volume of “nice” subsets of Rn by the determinants. Here, a “nice” set is a Lebesgue measurable set, of which open sets are included. In fact, the collection of Lebesgue measurable sets is so large that one needs to use the Axiom of Choice to prove that that there non-Lebesgue measurable sets! You can learn more about the topic of measurable sets in Math 540: Real Analysis. Recall that any matrix can be reduced to row echelon form by iteratively applying row operations. Through the correspondence of row operations and invertible matrices, we obtain the following proposition: Proposition 2.7.14. For A ∈ Mn (F), there exists P ∈ GLn (F) such that P A is upper triangular:   b11 b12 · · · b1n  0 b22 · · · b2n    PA =  . .. . . ..  . .  . . . .  0 0 · · · bnn Moreover, ker(A) = ker(P A), and ker(A) 6= {0} if and only if each of the diagonal elements in P A is nonzero. Recall Proposition 3.3.6, which states that the column rank of a matrix equals its row rank. Assuming the Rank-Nullity Theorem (Theorem 3.3.2), which states that given a linear transformation T : Fm → Fl , m = dim(im(T )) + dim(ker(T )). These two results give us the most theorem of this section:

SWILA NOTES

27

Theorem 2.7.15. Let A ∈ Mn (F). Then dim(ker(A)) = dim(ker(At )), where At ∈ Mn (F) is the transpose of A. Recall Atij := Atji . This result will give us that the eigenvalues of a square matrix are the same as those of its transpose, occurring with the same geometric multiplicities. This theorem will not be used before the proof of the Rank-Nullity Theorem to avoid circular reasoning.

28

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

3. Vector spaces: But wait, there’s more! I don’t know if the reader has ever been to Fat Sandwich Company, but it’s a delightful little sandwich spot on E. John St. It combines a variety of exquisite cooking methods and scrumptious ingredients to create delicate delicacies, fitting of the world’s finest restaurants. The thing is, the lucky consumer already knows what cheesesteak, chicken tenders, mozzarella sticks, waffle fries, mayonnaise, and an egg taste individually. But could you imagine them dumped in a hot fryer, then stuffed together in a hoagie roll? That sounds amazing. This is what this chapter is about. We already know the definition of a vector space and a linear transformation. We will concoct new vector spaces like subspaces, direct sum of vector spaces, and quotient spaces and consider how linear transformations fit in. How do I feel about this? Well, I’m hungry just thinking about it. 3.1. Subspaces. Definition 3.1.1. Let V be a vector space. A nonempty subset S ⊂ V is called a subspace of V if it is under closed under addition and scalar multiplication; i.e., • x, y ∈ S =⇒ x + y ∈ S • x ∈ S and α ∈ F =⇒ αx ∈ S. Example 3.1.2. Fix m ≤ n in N. Then {(x1 , . . . , xm , 0, . . . , 0) ∈ Rn } is a subspace of Rn . For example, we usually view the xy-plane as a subspace of R3 . Example 3.1.3. Given n ∈ N, the collection Pn (F) of polynomials of degree at most n is a subspace of F[t]. Definition 3.1.4. (New vector spaces) Fix a vector space V . Given subspaces M, N ⊂ V , we can define the intersection of M and N : M ∩ N := {x ∈ V : x ∈ M

and x ∈ N },

and the sum of M and N : M + N := {x + y ∈ V : x ∈ M

and y ∈ N }.

These define subspaces of V . Remark 3.1.5. We define the Grassmanian G(m, n) to be the collection of m-dimensional subspaces of Rn . This can actually be given a natural smooth structure and made into a compact manifold. By compactness, we mean that every sequence of k-dimensional subspaces of Rn has a subsequence that converges to a k-dimensional subspace of Rn . Here, given kdimensional subspaces Vj , V , j ∈ N, we say that Vj converges to V if there exists bases Bj = {vj1 , . . . , vjk } of Vj and B = {v1 , . . . , vk } of V such that Bj converges to B coordinatewise as points in Rn . Equivalently, if projVj is the orthogonal projection onto Vj and projV is the orthogonal projection onto V , the operator norm of projVj − projV : Rn → Rn converges to 0 (orthogonal projections onto subspaces will be defined in section 6.4). For readers not familiar with manifolds, a manifold is a set that locally looks like some Euclidean space. Examples of manifolds include the torus, the circle, and the sphere. For those interested in learning more about manifolds and differential geometry, I would highly recommend taking Math 518 next semester with Professor Albin.

SWILA NOTES

29

3.2. Direct sums. Definition 3.2.1. Let V be a vector space. Suppose M, N are subspaces of V . We say M and N are transversal if M + N = V . We say M and N are complementary if M + N = V and M ∩ N = {0}, in which case we define the direct sum of M and N as M ⊕ N := M + N. Equivalently, each element of V can be uniquely written as the sum of an element of M and an element of N . Remark 3.2.2. The reader may have heard of subspaces being complementary in another context. Subspaces M and N of a vector space are called complimentary if they say really nice things about each other. That is a bit different from our sense of complementary and is written with an “i”. M : I really like your zero vector! N : No way! I really like your zero vector! M : Omg! I just realized! We’re matching! We have the same zero vector! N : Hooray! :) Remark 3.2.3. Observe that if M and N are complementary, dim(M ⊕ N ) = dim(M ) + dim(N ). We can extend the previous definition to multiple subspaces of V . If M1 , . . . , Mk ⊂ V are subspaces, we say that V is a direct sum of M1 , . . . , Mk and write V = M1 ⊕ · · · ⊕ Mk if each x ∈ V can be uniquely written as x = x1 + . . . + xk ,

xi ∈ Mi .

It isn’t hard to show that this coincides for k = 2 with the original definition of the direct sum. Example 3.2.4. Define M0 := {a : a ∈ R}, M1 := {b + bt : b ∈ R}, and M2 := {c + ct + ct2 : c ∈ R}. Then P1 (R) = M0 ⊕ M1 and P2 (R) = M0 ⊕ M1 ⊕ M2 . Theorem 3.2.5. (Existence of complements) Let M ⊂ V be a subspace and assume that V = span{x1 , . . . , xn }. If M 6= V , then it is possible to choose xi1 , . . . , xik such that V = M ⊕ span{xi1 , . . . , xik }. Proof. (Sketch) Choose xij not in the span of M with the set of previously chosen xi until you span V . Corollary 3.2.6. If M ⊂ V is a subspace and dim(V ) < ∞, then dim(M ) ≤ dim(V ).

30

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

We conclude this section by proving a correspondence between direct sums of vector spaces and projections. Definition 3.2.7. A projection E : V → V is a linear operator satisfying E 2 = E. Thanks to Professor Lerman who pointed out during a Math 519 lecture that this result should be proven in Linear Algebra courses, but is often ignored. Proposition 3.2.8. (Direct sums ↔ projections) Fix a vector space V . If E : V → V is a projection, then V = Im(E) ⊕ ker E. On the other hand, given a direct sum decomposition V = W1 ⊕ W2 , there is a projection K : V → V satisfying Im(K) = W1 and ker K = W2 . Proof. First suppose E : V → V is a linear operator. For any x ∈ V , x = Ex + (x − Ex), whence it follows V = E(V ) + ker(E). It remains to show the intersection condition for direct sums. If x = Ey ∈ Im(E) ∩ ker E, then 0 = Ex = E 2 y = Ey = x. This proves Im(E) ∩ ker E = {0}. Now suppose V = W1 ⊕ W2 for subspaces W1 , W2 of V . For any x ∈ V , note x can be uniquely written as x = v1 + v2 for v1 ∈ W1 , v2 ∈ W2 . Define K : V → V by K(v1 + v2 ) = v1 . It isn’t hard to show that K is a projection that works. 3.3. Linear maps and subspaces. Definition 3.3.1. Let T : V → W be a linear transformation. We define the kernel (or nullspace) of T as ker(T ) := {x ∈ V : T (x) = 0} and the image (or range) of T as im(T ) := {T x ∈ W : x ∈ V }. The kernel and image form subspaces of V and W , respectively. We define rank(T ) = dim(im(T )), nullity(T ) = dim(ker(T )). The main theorem of the section is the following: Theorem 3.3.2. (The Rank-Nullity Theorem) Let V be finite dimensional and T : V → W a linear transformation. Then im(T ) is finite dimensional and dim(V ) = rank(T ) + nullity(T ). Proof. Choose a complement N for ker(T ) with dim(N ) = dim(V ) − dim(ker(T )). One can check T |N : N → im(T ) is an isomorphism (cf. The First Isomorphism Theorem for groups). This implies dim(im(T )) = dim(N ) = dim(V ) − dim(ker(T )). The Rank-Nullity Theorem follows from the fact that Corollary 3.2.6 implies dim(ker(T )) is finite-dimensional. The following two corollaries follow from the Rank-Nullity Theorem:

SWILA NOTES

31

Corollary 3.3.3. If M is a subspace of V and dim(M ) = dim(V ) < ∞, then M = V . Proof. Apply the Rank-Nullity Theorem to the inclusion map ι : M → V .

The following is one of the most useful results about linear operators on finite-dimensional vector spaces: Corollary 3.3.4. Suppose V is finite-dimensional. Then for a linear operator T : V → V , the following are equivalent: • T is injective. • T is surjective. • T is an isomorphism. We conclude this section with an important definition and result: Definition 3.3.5. Let A ∈ Ml×m (F). We define the row rank (column rank ) of A to be the dimension of the space spanned by the rows (columns) of A. Proposition 3.3.6. For any matrix A, the column rank of A is equal to the row rank of A. For a proof, see Theorem 7, page 57, of Petersen [6]. 3.4. Quotient spaces. In Section 3.1, we learned that the kernel of a linear transformation is always a subspace of the domain vector space. In this section, we will consider the converse; namely, given a subspace M of a vector space V , is there is a vector space W and a linear transformation T : V → W such that ker T = M ? We will answer this question in the affirmative by considering quotient spaces. We first review equivalence relations. Definition 3.4.1. Given a set S, a relation on S is a subset R ⊂ S × S. If (a, b) ∈ R, we write a ∼ b. We say that a relation on S is an equivalence relation if it satisfies the following the three properties: for all a, b, c ∈ S, (1) (Reflexivity) a ∼ a (2) (Symmetry) a ∼ b ⇒ b ∼ a (3) (Transitivity) a ∼ b and b ∼ c ⇒ a ∼ c. For a ∈ S, we define the equivalence class of a to be [a]∼ := {b ∈ S : b ∼ a}. Remark 3.4.2. Suppose ∼ is an equivalence relation on S. For any a, b ∈ S, [a]∼ = [b]∼ or [a]∼ ∩ [b]∼ = ∅. This means that any two equivalence classes are either equal or disjoint. Definition 3.4.3. Fix subsets S, T ⊂ V and α ∈ F. We define αS := {αs : s ∈ S}. S + T := {s + t : s ∈ S, t ∈ T }, For x ∈ V , we define the translate of S by x to be x + S := {x + s : s ∈ S}.

32

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Definition 3.4.4. (Quotient Space) Let V be a vector space and M ⊆ V a subspace. We define the quotient space V /M := {x + M : x ∈ V }. We sometimes write [x]M for x + M . This becomes a vector space with the operations α(x + M ) := (αx) + M (x + M ) + (y + M ) := (x + y) + M. One can show that these operations are well-defined, i.e., the operations are independent of the chosen representatives of the equivalence classes. Remark 3.4.5. Let M ⊆ V be a subspace. We can define an equivalence relation on V by v ≡ w if v − w ∈ M . Then, V /M is the collection of equivalence classes. Remark 3.4.6. We have an isomorphism V → V /{0}, identifying x ↔ [x]{0} . Example 3.4.7. View C as a real vector space of dimension 2, and identify x + iy ∈ C with (x, y) ∈ R2 . View R as a subspace of C by identifying R with {(t, 0) : t ∈ R}. For (x1 , y1 ), (x2 , y2 ) ∈ C, we have the following equivalent statements: (x1 , y1 ) ≡ (x2 , y2 ) ⇐⇒

[(x1 , y1 )]R = [(x2 , y2 )]R

⇐⇒

(x1 , y1 ) + R = (x2 , y2 ) + R

⇐⇒ ⇐⇒

(x1 , y1 ) − (x2 , y2 ) ∈ R y1 = y2 .

This implies C/{(t, 0) : t ∈ R} = {(0, y) + R : y ∈ R}. Definition 3.4.8. Fix a subspace M ⊆ V . Define the linear transformation π : V → V /M by π(v) := v + M . It is easy to see that π is surjective, and ker(π) = M . The Rank-Nullity Theorem (Theorem 3.3.2 applied to π gives us the following result: Proposition 3.4.9. Fix a finite-dimensional vector space V and a subspace M ⊆ V . Then, M and V /M are finite-dimensional with dim(V ) = dim(M ) + dim(V /M ). Remark 3.4.10. Besides the last proposition, note all of this discussion holds for infinitedimensional vector spaces. In fact, we can go a bit further if V is a Banach space (a complete normed space) and M ⊂ V is a closed subspace. In this case, V /M becomes a Banach space when equipped with the norm ||x + M || := inf ||x + m||V . m∈M

Moreover, the projection map π is an open mapping in this case, i.e., open sets are mapped by π to open sets.

SWILA NOTES

33

4. A minor determinant of your success While some may view the history of rock and roll as a tedious sum of parts, it is actually quite fascinating. Rock and roll experienced an expansion in interest in the United States in the 1940’s and 1950’s. One can use record sales to measure the growth in volume of rock and roll albums throughout the 20th century. While some believe that rock and roll grew in popularity among all ages, the expansion by minors really set this music genre apart. While Axl Rose and Guns N’ Roses changed the landscape of rock and roll, I believe that the operations of Rose really made all the difference. Guns N’ Roses’s long-term ability to stay inverse was a major determinant in their eventual nontrivial legacy. Other than Guns N’ Roses, there were a number of other great rock and roll bands. There were the Rolling Stones, the Who, AC/DC, Pink Fl. . . wait a minute . . . AC/DC. . . that sounds a lot a b like ad − bc, the determinant of the (2 × 2)-matrix ! And that happens to be what c d this chapter is about: determinants! We will interpret determinants as measuring how linear transformations distort volumes and show how to compute determinants somewhat easily. (I hope the reader has realized I have no knowledge of rock and roll.) 4.1. Determinant of a matrix. Recall the definition of a group from Definition 2.7.9. Definition 4.1.1. Fix n ∈ N. We define the nth group of permutations Sn to be the collection of all bijections σ : {1, . . . , n} → {1, . . . , n}. This forms a group under composition: (τ σ)(j) := τ (σ(j)). The elements of Sn are called permutations. Definition 4.1.2. We call an element σ ∈ Sn an interchange if there exist 1 ≤ i < j ≤ n such that f (i) = j, f (j) = i, and f (k) = k whenever k 6= i and k 6= j. Lemma 4.1.3. Every element σ of Sn can be written as the product (i.e., the composition) of permutations. Moreover, any two products equal to σ have the same parity of factors. Proof. Imagine ordering a shuffled deck of cards by suit then number. We can first find the 2 of ♣ and switch it with the first card of the deck. Then, we can find the 3 of ♣ and switch it with the second card of the deck. Repeating this, we see the first part of the lemma holds. The second part of the lemma will be left as an exercise to the reader, i.e., I don’t remember this proof and I’m just stating this lemma to set up the formal definition of the determinant. From the lemma, we can make the following definition and have it be well-defined. Definition 4.1.4. Let σ ∈ Sn . If we can write σ as the product of an even number of permutations, we say that σ is an even permutation and define (−1)σ := 1. Otherwise, we say that σ is an odd permutation and define (−1)σ := −1. Definition 4.1.5. Fix a matrix A = (aij ) ∈ Mn (F). We define the determinant of A to be det A :=

X σ∈S n

σ

(−1)

n Y i=1

aiσ(i) .

34

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

The next proposition lists the main properties of the determinant. Statements (1)-(3) below can be deduced not so badly from the definition. I believe statement (4) is a bit more tedious, if I remember. I will not prove these statements as it is much more important you remember the statements and apply them. Proposition 4.1.6. Fix A = (aij ), B = (bij ) ∈ Mn (F), and c ∈ F. The determinant det : Mn (F) → F, A 7→ det A, satisfies the following properties: (1) det(Idn ) = 1, where Idn is the n × n identity matrix. (2) If cA := (caij ), det cA = cn det(aij ). (3) If the matrix A˜ is obtained from A by switching two rows, then det A˜ = − det A. The same statement holds for switching two columns. (4) If the matrix A0 is obtained from A by adding a multiple of one row to another row, then det A0 = det A. Equality also holds when adding a multiple of one column to anothe column. (5) det(AB) = det(BA). (6) det(AB) = det(A) det(B). (7) det(At ) = det A, where At is the transpose of A. We will use this proposition and Proposition 2.7.12 to prove the following theorem. Lemma 4.1.7. A matrix A is invertible if and only if det A = 0. Moreover, if A is invertible, then det A 6= 0 and det(A−1 ) = 1/ det A. Proof. If A is invertible, det(A) · det(A−1 ) = 1, which implies det A 6= 0. On the other hand, if det A = 0, det(AB) = det(A) det(B) = 0 for all B ∈ Mn (F). In particular, A is not invertible. The second part of the lemma follows from Proposition 4.1.6. We have the following equivalent statements for n × n-matrices. Proposition 4.1.8. For A ∈ Mn (F), the following statements are equivalent: (1) (2) (3) (4) (5) (6)

A is invertible. A ∈ GLn (F). The rows of A are linearly independent. The columns of A are linearly independent. det A 6= 0. A is the product of elementary matrices.

Proof. (1) ⇔ (2) is by definition. (1) ⇔ (5) is by Lemma 4.1.7. (1) ⇔ (6) is the content of Proposition 2.7.12. Observe (3) holds if and only if A has rank n, if and only if the row reduced echelon form of A is In , if and only if A is invertible. This implies (3) ⇔ (1), and (4) ⇔ (1) follows from a similar argument. Random Thought 4.1.9. Did you hear that downtown, there’s a marketplace selling homemade crafts? I just think it’s a little bazaar. We end this section by showing that we can define the determinant of a linear operator. Fix a finite-dimensional vector space V with bases B, C. Define S = [Id]B,C . From Theorem

SWILA NOTES

35

2.4.3, [T ]B = S[T ]C S −1 . Using (4) of Proposition 4.1.6, det[T ]B = det([Id]C,B [T ]C [Id]B,C ) = det[Id]C,B · det[T ]C · det[Id]B,C = det[Id]C,B · det[Id]B,C · det[T ]C = det Idn · det[T ]C = det[T ]C . Thus, we may define det T = det([T ]B ), and this is well-defined. We present one final result (without proof). This result will be used when we talk about cyclic subspaces and the Jordan canonical form. By ?, I refer to a (possibly nonzero) matrix, and by 0, I refer to a zero matrix of some size. Proposition 4.1.10. Suppose A1 , . . . , Ak are square matrices. Then   A1 ? · · · ?  0 A2 · · · ?    det  .  = det(A1 ) det(A2 ) · · · det(Ak ).  0 0 . . . ..  0 0 · · · Ak 4.2. Computing the determinant of small matrices. This is the most important section of this chapter. We will review how to reasonably compute the determinant of matrices of small size, say of size at most 5 × 5. This will be very important for the next chapter, where we compute the eigenvalues of linear transformations. Recall that a hyperplane of Rn is the linear span of n − 1 linearly independent vectors in n R . In this vein of thought, I make up the following definition. Definition 4.2.1. Let A ∈ Mn (F). For 1 ≤ i, j ≤ n, we define Aij to be the (n − 1) × (n − 1) matrix obtained from A by removing the ith and j th column. We call Aij a hypermatrix. Example 4.2.2. Define 

11 12 A =  21 22 31 32 We compute some of the hypermatrices: 22 23 11 22 A = , A = 32 33

 13 23  ∈ M3 (R). 33 11 13 31 33

,

A

32

=

11 13 21 23

.

On another note, by subtracting the first row of A from its second and third rows, we obtain the matrix   11 12 13 A˜ =  10 10 10  . 20 20 20 By (3) of Proposition 4.1.6, det A˜ = det A. As the rows of A˜ are linearly dependent, det A˜ = 0 by Proposition 4.1.8. Applying Proposition 4.1.8 again, this implies A is not invertible. Definition 4.2.3. Let A ∈ Mn (F). A minor of A is the determinant of a k × k-submatrix obtained from A by removing n − k rows and n − k columns, k < n. We wil use Laplace expansion to express the determinant of a matrix as a linear combination of the minors of its hypermatrices:

36

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Theorem 4.2.4. (Laplace expansion) Let A ∈ Mn (F). Fix 1 ≤ i, j ≤ n. We may compute det A by expanding A into hypermatrices along its ith row: det A =

n X

(−1)i+j aij det Aij .

j=1

We may also compute det A by expanding A into hypermatrices along its j th column: det A =

n X

(−1)i+j aij det Aij .

i=1

We see that it is wise to expand A along rows or columns with many zero entries to make computations easier. I will not prove this theorem as the proof is a bit tedious using the defintion of the determinant. Rather, I’ll show how to use it. Remark 4.2.5. (The determinant of 2×2-matrices) Using the definition of the determinant, we immediately obtain a b det = ad − bc for all a, b, c, d ∈ F. c d Using this, we can compute the determinant of 3 × 3 matrices. Example 4.2.6. Let  3 0 −2 A =  −4 5 −1  . 2 3 1 

As the second column of A has a zero, it is smart to expand A along its second column. By Laplace expansion (see Theorem 4.2.4), −4 −1 3 −2 3 2 1+2 2+2 3+2 det A = (−1) 0 det + (−1) 5 det + (−1) 3 det 2 1 2 1 −4 −1 = 5(3 · 1 − 2 · (−2)) − 3(3 · (−1) − (−4) · 2) = 35 − 15 = 20. We conclude this section with an example of using Laplace expansion to find the determinant of a matrix with several zero entries. Example 4.2.7. Let 

 3 2 0 4  0 −4 2 3  . B=  5 0 −1 0  −3 −1 0 2 As the third row of B has two zeros, we will expand B along this row. By Laplace expainsion (theorem 4.2.4),     3 0 4 3 2 4 (4.2.8) det B = (−1)1+3 5 det  −4 2 3  + (−1)3+3 (−1) det  0 −4 3  . −1 0 2 −3 −1 2

SWILA NOTES

37



 3 0 4 By expanding  −4 2 3  along its second row, −1 0 2   3 0 4 3 det  −4 2 3  = 2 det −1 −1 0 2   3 2 4  0 −4 3  along its first column, By expanding −3 −1 2   3 2 4 −4 3 2   0 −4 3 det = 3 det − 3 det −1 2 −4 −3 −1 2

4 2

4 3

= 20.

= 3 · (−5) − 3 · 22 = −81.

Plugging back into 4.2.8, det B = 5 · 20 + 81 = 181. 4.3. Geometric interpretation and the special linear group. Recall that we have the following change of variables formula for R: Proposition 4.3.1. (Change of variables formula) Let f : R → R be a continuous function and φ : [a, b] → [c, d] a differentiable, onto, increasing function. Then Z b Z d f (φ(y))φ0 (y) dy. f (x) dx = c

a

We can generalize this to functions of multiple variables. The determinant of a matrix A gives us intuition on how A distorts the volume of regions. More specifically, we have the following change of variables formula: Theorem 4.3.2. (Change of variables formula) Let T : Rn R→ Rn be an invertible linear transformation and f : Rn → Rn an integrable function (i.e., Rn f (x) dx < ∞). Then Z Z f (T (x)) det(T ) d¯ x= f (¯ y ) d¯ y. Rn

Rn

Using measure theory (Math 540), we can make sense of lengths, areas, and volumes of regions that aren’t simple intervals and rectangular prisms. This more general notion of volume agrees on simple elementary regions. Here, by elementary, I don’t mean regions constructed without the use of complex analysis; rather, I’m talking about regions that were discussed in-depth in grades K-5. We can use Theorem 4.3.2 to obtain the following corollary: Corollary 4.3.3. (Distortion of volumes) Let T : Rn → Rn be an invertible linear transformation and A ⊆ Rn a “nice” region. Then Volume(T (A)) = | det(T )| · Volume(A), where T (A) = {T x : x ∈ Rn }. We can now define the special linear groups. Recall the definition of a group in Definition 2.7.9. Definition 4.3.4. We define the nth special linear group SLn (F) to be the collection/group of matrices A satisfying det(A) = 1.

38

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Remark 4.3.5. SLn (F) forms a subgroup of the general linear group GLn (F). In light of Corollary 4.3.3, SLn (R) can be thought of as the collection of matrices that preserve volumes of regions in Rn (up to orientation).

SWILA NOTES

39

5. Eigenstuff: of maximum importance No one really loves the IRS and taxes. To many, it seems like paying taxes involves blocks of tedious calculations with no real gratification at the end of it. But think about the enormous task put before the Internal Revenue Service. There are over 300 million people in the United States and many different occupations. Why, I can count 20 different jobs just using my fingers and toes! And it must be very difficult to construct a comprehensive system to determine how to collect taxes from citizens of varying statuses. That is the beauty of certain tax forms, like the 1040-EZ and the 1098-T. They provide the IRS with a relatively easy way to compare different people. In this chapter, we will similarly be interested in a couple simple forms for linear operators: the rational canonical form and the Jordan canonical form. While opponents of the IRS may obsess over characteristic root problems of the agency, we will be interested in finding roots of the characteristic polynomials of linear operators. While some may see the IRS as the sum of decomposing, never-changing parts, we will show that we can decompose finite-dimensional vector spaces as the direct sum of invariant subspaces. To conclude, some may see the IRS as evil and useless. But perhaps, the agency’s inefficiency is mainly due to its severe recent budget cuts, losing about 20% of its funding over the past 5 years. And while some students may hate the Jordan canonical form, perhaps they have never been taught it with enthusiasm and perspective. So much like how the IRS has been underfunded, studying of the Jordan canonical form has been underfun did. I recommend all to watch John Oliver’s more formal and less taxing take on the IRS on Last Week Tonight [5]. 5.1. Facts about polynomials. Just to make sure all terms are defined, we recall the following definitions: Definition 5.1.1. Fix a polynomial p(t) = an tn + an−1 tn−1 + · · · + a0 ∈ F[t]. • If an 6= 0, we define n to be the degree of p(t). We write n = deg(p). • q(t) ∈ F[t] divides p(t), and we write q(t)|p(t), if there exists d(t) ∈ F[t] so that p(t) = q(t)d(t). • λ ∈ F is a root of p if p(λ) = 0. • λ ∈ F is a root of degree d ∈ N if (t − λ)d divides p(t) but (t − λ)d+1 does not divide p(t). Note for each λ ∈ F, p(λ) is an element of F. We recall a few results concerning polynomials (without proof): Proposition 5.1.2. (The division algorithm) For all p(t), q(t) ∈ F[t], there exist d(t), r(t) ∈ F[t] such that • p(t) = d(t)q(t) + r(t) • deg r(t) < deg q(t) or r(t) = 0. Proposition 5.1.3. (The greatest common divisor) For given p, q ∈ F[t], there is a unique monic polynomial d = gcd(p, q) with the property satisfying: • d divides p and q. • If d0 divides p and q, then d0 divides d. Moreover, there are r, s ∈ F[t] such that d = p · q + r · s. Definition 5.1.4. We call d from Proposition 5.1.3 the greatest common divisor of p and q, and write d = gcd(p, q). We say p and q are relatively prime if gcd(p, q) = 1.

40

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Remark 5.1.5. The reader should not get this confused with when one has a bunch of a day old meat, and one has to determine which is the best, which is ... relatively prime. Using the division algorithm, we have the following lemma: Lemma 5.1.6. Given p(t) ∈ F[t] and λ ∈ F, λ is a root of p(t) if and only if (t − λ) divides p(t). This leads us to the statement that C is algebraically closed. Theorem 5.1.7. (Fundamental Theorem of Algebra) Every polynomial p(t) ∈ C[t] has a root. Equivalently, every polynomial p(t) ∈ C[t] splits, i.e., if n is the degree of p(t), there exist a, λ1 , . . . , λn ∈ C (not necessarily distinct) such that p(t) = a(t − λ1 )(t − λ2 ) · · · (t − λn ). Random Thought 5.1.8. What did 1 + x2 say when it left R[x] for C[x]? It’s been real, but I’ve gotta split. Definition 5.1.9. Fix a linear operator T : V → V . Define T 1 = T and inductively define T i+1 = T ◦ T i for each i ∈ N. Given a polynomial p(t) = a0 + a1 t + · · · + an tn , we define the linear operator p(T ) : V → V by p(T ) = a0 1V + a1 T + · · · + an T n . Given another polynomial q(t), we define the linear operator on V p(T )q(T ) := p(T ) ◦ q(T ). Remark 5.1.10. Fix a linear operator T : V → V . We will frequently use the fact that if p(t), q(t) ∈ F[t], then p(T )q(T ) = q(T )p(T ). Moreover, if a subspace W ⊆ V is T -invariant, i.e., T x ∈ W for all x ∈ W , then p(T )x ∈ W for all p(t) ∈ F[t] and x ∈ W . 5.2. Eigenvalues. The identity operator of a vector space is special in that it sends every vector to itself. This makes it very easy to study the behavior of this operator. We are interested in analyzing more complicated operators. To do this, we aim to find special vectors of a linear operator. The treatment of eigenstuff in this section will be very typical. Vector spaces are not necessarily finite-dimensional unless specifically noted. Definition 5.2.1. Let V be a vector space and T : V → V a linear operator. We say a scalar λ ∈ F is an eigenvalue of T if there exists a nonzero vector x ∈ V such that T (x) = λx. We call such a vector 0 6= x ∈ V an eigenvector of T corresponding to λ. If λ is an eigenvalue of T , we define the subspace Eλ := {x ∈ V : T x = λx} of V . We call Eλ the eigenspace for λ. Given a linear operator T : V → V , note that a scalar λ is an eigenvalue of T if and only if ker(T − λId) 6= {0}. If λ is an eigenvalue of T , note dim(Eλ ) > 0 and Eλ = ker(T − λId). If V is finite-dimensional, we can talk about matrix representations of linear operators with respect to an ordered basis. Definition 5.2.2. Let V be a finite-dimensional vector space, and fix a basis B for V . Let T : V → V be a linear operator. We define the characteristic polynomial of T to be χT (t) = det([T ]B − tId).

SWILA NOTES

41

To see this definition is well-defined, let B, C be two bases for V . Let S = [Id]C,B be the change of basis matrix from B to C. Then det([T − tId]B ) = det(S −1 [T − tId]C S) = det(S −1 ) det([T − tId]C ) det S = det([T − tId]C ). Remark 5.2.3. Some define the characteristic polynomial to be det(tId − [T ]B ). I define it the opposite way because I find that I am less likely to mess up signs of matrix entries this way. Also, it doesn’t really matter as we only care about the roots of the characteristic polynomial. This leads us to the following equivalent statements: Proposition 5.2.4. Assume V is finite-dimensional, T : V → V is a linear operator, and λ ∈ F. The following are equivalent: (1) λ is an eigenvalue of T . (2) ker(T − λId) 6= {0}. (3) T − λId is not invertible. (4) There exists a basis B in which [T − λId]B is not invertible. (5) There exists a basis B in which det([T ]B − λIdn ) = det([T − λId]B ) = 0. (6) λ is a root of the characteristic polynomial χT (t). Proof. This lemma follows from Proposition 4.1.8.

Example 5.2.5. In this example, we will be find the eigenvalues and eigenvectors of a simple 4 × 4 matrix. Let   0 1 0 0  −1 0 0 0   A=  0 0 0 1  ∈ M4 (C). 0 0 1 0 By expanding along the first column, we calculate   −t 1 0 0  −1 −t 0 0   χA (t) = det(A − tI) = det   0 0 −t 1  0 0 1 −t     −t 0 0 1 0 0 = −t det  0 −t 1  + det  0 −t 1  0 1 −t 0 1 −t = −t(−t(t2 − 1)) + t2 − 1 = t4 − t2 + t2 − 1 = t4 − 1. The eigenvalues of A, i.e., the roots of χA (t), are We find  −1 1 0  −1 −1 0 E1 = ker(A − Id) = ker   0 0 −1 0 0 1

1, −1, i, and −i.  0 0   = {(0, 0, z, z) : z ∈ C}. 1  −1

42

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Similarly, E−1 = {(0, 0, z, −z) : z ∈ C} Ei = {(z, iz, 0, 0) : z ∈ C} E−i = {(iz, z, 0, 0) : z ∈ C}. We can use induction to prove the following important proposition: Proposition 5.2.6. Let V be a vector space (possibly infinite-dimensional), and let T : V → V a linear operator. Let x1 , . . . , xk be eigenvectors of T corresponding to distinct eigenvalues λ1 , . . . , λk . Then {x1 , . . . , xk } is linearly independent. Equivalently, Eλ1 + · · · + Eλk = Eλ1 ⊕ · · · ⊕ Eλk . In particular, k ≤ dim V . Proof. This proposition is actually surprisingly difficult to prove. A cheap way to prove this proposition is by assuming the formula for the determinant of a Vandermonde matrix. For another proof, see Petersen [6], Lemma 14, page 125. I would like to remind the reader here that my notes are heavily adapted from the notes of Petersen [6]. We conclude this section by comparing the algebraic and geometric multiplicity of eigenvalues: Definition 5.2.7. We define the algebraic multiplicity of an eigenvalue λ to be the degree that λ is a root of χT (t). We define the geometric multiplicity of an eigenvalue λ to be the dimension of the eigenspace Eλ . Proposition 5.2.8. (AM ≥ GM) Let T : V → V be a linear operator on a finite-dimensional vector space. If λ is an eigenvalue of T , the algebraic multiplicity of λ is at least the geometric multiplicity of λ. Proof. Let n = dim(V ), and k be the geometric multiplicity of an eigenvalue λ. Note k ≤ n since ker(T − λId) ⊆ V is a subspace. Let {x1 , . . . , xk } be a basis for ker(T − λI), and complete this to a basis B = {x1 , . . . , xk , xk+1 , . . . , xn } for V . Then [T ]B has the form λIdk ? , 0 A for some (n − k) × (n − k)-matrix A. It follows from Proposition 4.1.10 that χT (t) = χλIdk (t)χA (t) = (λ − t)k χA (t). As (λ − t)k divides χT (t), the algebraic multiplicity of T is at least k.

5.3. The minimal polynomial. Definition 5.3.1. Fix vector spaces V and W . We define L(V, W ) to be the vector space of linear transformations T : V → W . Lemma 5.3.2. Suppose V, W are finite-dimensional vector spaces of dimension l, m, respectively. Then L(V, W ) is finite-dimensional with dim L(V, W ) = l · m.

SWILA NOTES

43

Proof. Fix bases {e1 , . . . , el }, {f1 , . . . , fm } of V, W , respectively. For all 1 ≤ i ≤ l and 1 ≤ j ≤ m, define fj , if k = i Ti,j (ek ) = 0, otherwise. It is easy to check that {Ti,j } forms a basis for L(V, W ).

Proposition 5.3.3. (Annihilator is nonzero in finite-dimensional case) Suppose V is a finitedimensional vector space, and T : V → V is a linear operator. Then there is a nonzero polynomial p(t) ∈ F[t] such that p(T ) = 0. In fact, if V is of dimension n, we can take p(t) to have degree at most n2 . Proof. By the above lemma, any collection of n2 + 1 transformations in L(V, V ) is linearly dependent. In particular, there exist constants a0 , a1 , . . . , an2 , not all zero, such that 2

a0 I + a1 T + · · · + an2 T n = 0. 2

We may take p(t) = a0 + a1 t + · · · + an2 tn 6= 0.

Example 5.3.4. Let S : F[t] → F[t] be the “multiplication by t” linear operator given by p(t) 7→ t · p(t). Observe for all p(t), q(t) ∈ F[t], q(S)(p(t)) = q(t) · p(t). In particular, q(S)(1) = q(t) 6= 0 for all nonzero q(t) ∈ F[t]. Thus, S is an example of a linear operator on an infinite-dimensional vector such that the prior proposition does not hold. Definition 5.3.5. Let V be a finite-dimensional vector space and 0 6= T : V → V a linear operator. We define the minimal polynomial of T to be a monic polynomial mT (t) ∈ F[t] of least degree such that mT (T ) = 0. Proposition 5.3.6. Given a finite-dimensional vector space V and a linear operator T : V → V , a minimal polynomial exists and is unique. Moreover, if q(t) ∈ F[t] satisifes q(T ) = 0, then mT (t) divides q(t), i.e., there exists a polynomial d(t) ∈ F[t] such that q(t) = mT (t)d(t). Proof. By the above proposition, there exists a polynomial P (t) ∈ F[t] such that P (T ) = 0. Define d = min{deg(q(t)) : q(T ) = 0, q(t) 6= 0} ≥ 1. Choose a polynomial m(t) ˜ = a0 + a1 t + . . . + ad t d ,

ad 6= 0

with m(T ˜ ) = 0. Then mT (t) = a−1 ˜ is a minimal polynomial for T . d m(t) We show that mT (t) divides any polynomial q(t) ∈ F[t] with q(T ) = 0. Uniqueness of the minimal polynomial would then follow. Fix such a polynomial q(t). The desired result is clear if q(t) = 0 so assume otherwise. By the division algorithm (proposition 5.1.2), there exist polynomials d(t), r(t) ∈ F[t] such that q(t) = mT (t)d(t) + r(t) and either deg(r(t)) < d or r(t) = 0. Then r(T ) = q(T ) − mT (T )d(T ) = 0. This contradicts the definition of d if r(t) 6= 0. Random Thought 5.3.7. All beds are warm. That’s a blanket statement.

44

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Example 5.3.8. Define T : R2 → R2 by T (x, y) = (x, 2y). If S is the standard basis for R2 , 1 0 [T ]S = . 0 2 Observe for each k ∈ N,

k

[T ]S =

1 0 0 2k

,

which implies for each p(t) ∈ R[t], [p(T )]S =

p(1) 0 0 p(2)

.

We observe p(T ) = 0

⇐⇒

p(1) = p(2) = 0.

It follows that the minimal polynomial of T is mT (t) = (t − 1)(t − 2) = t2 − 3t + 2. 5.4. Diagonalizability. Fix n ∈ N. A particularly simple subset of Mn (F) is the collection of diagonal n × n matrices. These are the matrices in Mn (F) for which every entry off the diagonal is zero. If D is diagonal:   a11 0 · · · 0  0 a22 0    D= . , ..  ..  . 0

0

ann

then Dei = aii ei for all i. This means that ei is an eigenvector of D with corresponding eigenvalue λ. In particular, D has a basis of eigenvectors {e1 , . . . , en }. Suppose a vector space V has a basis B = {v1 , . . . , vn }. Recall we define the coordinates of a vector x = α1 v1 + · · · + αn vn with respect to B as   α1   [x]B =  ...  . αn Recall for a linear operator T : V → V , we define the matrix representation of T with respect to B to be   | | [T ]B =  [T (v1 )]B · · · [T (vn )]B  . | | Definition 5.4.1. Let V be a (possibly infinite-dimensional) vector space. We call a linear operator T : V → V diagonalizable if there is a basis of eigenvectors for V . We call such a basis an eigenbasis for V . λ 1 Example 5.4.2. Fix λ ∈ F and define Aλ := . Then Aλ is not diagonalizable as 0 λ 0 1 its only eigenvalue is λ and ker has dimension 1. 0 0

SWILA NOTES

45

2 −1 Example 5.4.3. Let A = . We calculate −1 2 2 − t −1 det(A − λI) = = t2 − 4t + 3 = (t − 1)(t − 3). −1 2 − t This implies the eigenvalues of A are 1 and 3. Choosing an eigenvector corresponding to each eigenvalue, these two vectors form a basis of eigenvectors for V . Hence, A is an example of a nondiagonal matrix that is diagonalizable. In the case of finite-dimensional vector spaces, we can talk about matrix representations. This gives us the following proposition. Proposition 5.4.4. Let V be a finite-dimensional vector space and T : V → V a linear operator. Then T is diagonalizable if and only if there is a basis B for which [T ]B is diagonal. Suppose a matrix A has an eigenbasis B = {x1 , . . . , xn } with corresponding eigenvalues λ1 , . . . , λn (not necessarily distinct). By the discussion in Section 2.4, A = [Id]S,B [A]B [Id]B,S . Hence, A = SDS −1 , where

 λ1 0 · · · 0 | | |  0 λ2 · · · 0      x1 x2 · · · xn and D= . S= .. . . ..  . .  . . .  . | | | 0 0 · · · λn We leave the following as an exercise for the reader (just combine bases of the individual eigenspaces to obtain a basis for the whole space, or separate an eigenbasis into bases of the individual eigenspaces). 





Proposition 5.4.5. Let T : V → V be a linear operator on a finite-dimensional vector space. Suppose λ1 , . . . , λk are the eigenvalues of T . Then T is diagonalizable if and only if V = Eλ1 ⊕ · · · ⊕ Eλk , where Eλi is the eigenspace corresponding to λi . Using Proposition 5.2.6, we have a sufficient condition for diagonalizability of linear operators on finite-dimensional vector spaces. Corollary 5.4.6. (First characterization of diagonalizability) Let V be a finite-dimensional vector space of dimension n and T : V → V a linear operator. If λ1 , . . . , λk are distinct eigenvalues of T and n = dim ker(T − λ1 Id) + · · · + dim ker(T − λk Id), then T is diagonalizable. In particular, if T has n distinct eigenvalues, then T is diagonalizable. Proof. For each j = 1, . . . , k, choose a basis Bj for ker(T − λj Id). By Proposition 5.2.6, B := B1 ∪ · · · Bk is linearly independent and has n elements. This implies B is a basis for V consisting of eigenvectors.

46

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

I leave the proof of the following theorem as an exercise to the reader. Theorem 5.4.7. Let T : V → V be a linear operator of a vector space of dimension n. Then T is diagonalizable if and only if the minimal polynomial factors as mT (t) = (t − λ1 ) · · · (t − λk ), for distinct λ1 , · · · , λk ∈ F. The following theorem follows from the fact that the geometric multiplicity of an eigenvalue is at at most its algebraic multiplicity (Proposition 5.2.8): Theorem 5.4.8. Suppose F = C and T : V → V is a linear operator on a vector space of dimension n. Then T is diagonalizable if and only if the algebraic multiplicity of each eigenvalue equals its geometric multiplicity. Proof. The sum of the algebraic multiplicities of the eigenvalues of T equals the degree of χT (t), which is n. If T is diagonalizable, V has an eigenbasis. It follows from Proposition 5.2.8 that the algebraic and geometric multiplicities agree for each eigenvalue. The other direction is clear. 5.5. Cyclic subspaces. We will soon consider the Jordan canonical form of a matrix. Before we do so, we need to study further the relationship between the minimal polynomial and the characteristic polynomial. This section concludes with the statement that we can decompose each finite-dimensional vector space into the direct sum of cyclic subspaces. This will provide us in the next section with the rational canonical form. In this section, all vector spaces (besides F[t]) will be assumed to be finite-dimensional. Given a linear operator T , recall the inductive definition of T k for k ∈ N. An important observation used throughout the next couple sections is the following: given any p(t), q(t) ∈ F[t], p(T ) ◦ q(T ) = q(T ) ◦ p(T ). As noted in the preface, this section (as in most sections) will be adapted from Petersen’s notes [6]. Given a vector space V and a linear operator T : V → V , recall that we say that a subspace W ⊆ V is T -invariant if T (W ) ⊆ W . Definition 5.5.1. Fix a vector space V and a linear operator T : V → V . Given x ∈ V , we define Cx := span{x, T x, T 2 x, . . .} to be the cyclic subspace T -generated by x. We call a subspace W ⊆ V cyclic if W = Cx for some x ∈ V . In this case, we say that W is T -generated by x. Example 5.5.2. Define the linear operator T : F[t] → F[t] by p(t) 7→ t · p(t). Then F[t] is T -generated by the constant polynomial 1 ∈ F[t]. Fix a nonzero vector x ∈ V and a linear operator T : V → V . By Proposition 1.3.2, there is a smallest k such that T k x ∈ span{x, T x, . . . , T k−1 x}. This implies that there exist scalars α0 , α1 , . . . , αk−1 ∈ F such that T k x + αk−1 T k−1 x + · · · + α0 x = 0. Note k ≤ dim(V ). This gives us the following lemma.

SWILA NOTES

47

Lemma 5.5.3. Let T : V → V be a linear operator and a nonzero vector x ∈ V . Then Cx is T -invariant and we can find k such that x, T x, . . . , T k−1 x form a basis for Cx . The matrix representation for T |Cx with respect to this basis is   0 0 · · · 0 −α0  1 0 · · · 0 −α1     0 1 · · · 0 −α2  ,    .. .. . .   . . . 0 0 ···

1 −αk−1

where T k x + αk−1 T k−1 x + · · · + α0 x = 0. Random Thought 5.5.4. I love airplanes and everything about them. So, I always get so disappointed while watching the first episode of a series. Darn misleading titles... Definition 5.5.5. Given a monic polynomial p(t) = tn + αn−1 tn−1 + · · · + α0 ∈ F[t], we define the companion matrix of p(t) to be the n × n-matrix   0 0 · · · 0 −α0  1 0 · · · 0 −α1     0 1 · · · 0 −α2  Ap :=  .  .. .. . .  ..  . .  . . 0 0 ···

1 −αn−1

The companion matrix for p(t) = t + α is just [−α]. Proposition 5.5.6. The characteristic polynomial and minimal polynomial of a companion matrix Ap are both p(t), and all eigenspaces are one-dimensional. In particular, Ap is diagonaliable if and only if p(t) splits and the roots of p(t) are distinct. Proof. Fix a polynomial p(t) = tn + αn−1 tn−1 + · · · + α0 . By interchanging rows and adding multiples of rows to others, one can show that the determinant of   t 0 ··· 0 α0  −1 t · · · 0  α1     α2 tId − Ap =  0 −1 · · · 0   ..  .. . . .. ..  .  . . . . 0 0 · · · −1 t + αn−1 is p(t) (see Petersen [6], Proposition 19, page 136). More specifiically, one can use these operations to reduce tId − Ap to the upper-triangular matrix   −1 t · · · 0 α1  0 −1 · · · 0 α2     ..  .. ..  0 . . .  0  .  ..  ..  . . · · · −1 αn−1  0 0 · · · 0 p(t)

48

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

If λ is a root of p(t), i.e., λ is an eigenvalue of Ap , then  −1 λ · · · 0 α1  0 −1 · · · 0 α2   . .. . . . ..  0 . 0   .. ..  . . −1 αn−1 0 0 ··· 0 0

       

has rank n − 1. It follows that each eigenspace has dimension 1. en−1 = en are linearly For the minimal polynomial, first note e1 , Ap e1 = e2 , . . . , An−1 p independent. This implies Ap is not the root of any nonzero polynomial of degree less than n. On the other hand, for each k = 1, . . . , n, k−1 n Anp (ek ) = Anp Ak−1 p (e1 ) = Ap Ap (e1 )

and Anp (e1 ) = −α0 e1 − α1 e2 − · · · − αn−1 en = −α0 e1 − α1 Ap e1 − · · · − αn−1 An−1 e1 . p This implies p(Ap )(e1 ) = 0, and hence p(Ap )(ek ) = Ak−1 · p(Ap )(e1 ) = 0 p for all 1 ≤ k ≤ n. It follows that p is the minimal polynomial of Ap .

The following lemma follows from Proposition 4.1.10: Lemma 5.5.7. Fix A1 ∈ Mk (F), B ∈ Mk×(n−k) (F), and A2 ∈ Mn−k (F). Define A1 B A= . 0 A2 Then χA (t) = χA1 (t)χA2 (t). Using this lemma, we obtain the Cayley-Hamilton Theorem: Theorem 5.5.8. (Cayley-Hamilton Theorem) Let T : V → V be a linear operator on an n-dimensional vector space V . Then T is a root of its characteristic polynomial: χT (T ) = 0. Proof. Fix x ∈ V . We need to show the linear operator χT (T ) kills x, i.e., χT (T )(x) = 0. By Lemma 5.5.3, we may choose a basis x, T x, . . . , T k−1 x for the cyclic subspace Cx generated by x. Complete this to a basis B for V . Let p(t) be the monic polynomial of degree k such that p(T )(x) = 0. Then Ap ? [T ]B = , 0 A for some (n − k) × (n − k)-matrix A. By Lemma 5.5.7, χT (t) = χA (t)p(t). Hence, χT (T )(x) = χA (T ) ◦ p(T )(x) = 0.

SWILA NOTES

49

An important corollary then follows from Proposition 5.3.6. Recall that we say a polynomial p(t) divides q(t) ∈ F[t] if there exists d(t) ∈ F[t] such that q(t) = p(t) · d(t). Corollary 5.5.9. Let T : V → V be a linear operator. Then the minimal polynomial mT (t) divides the characteristic polynomial χT (t). We now can obtain an interesting characterization of finite-dimensional vector spaces. Theorem 5.5.10. (Cyclic subspace decomposition) Let T : V → V be a linear operator. Then V can be written as the direct sum of cyclic subspaces: V = C x1 ⊕ · · · ⊕ C xk

for some x1 , . . . , xk ∈ V.

In particular, T has a block diagonal matrix representation where each block is a companion matrix:   Ap1 0 · · · 0  0 Ap2 · · · 0    [T ] =  . ..  , . . .  . . .  0 0 · · · Apk and χT (t) = p1 (t) · · · pk (t). Moreover, the geometric multiplicity of a scalar λ satisfies dim(ker(T − λIV )) = #{i : pi (λ) = 0}. In particular, we see that T is diagonalizable if and only if all of the companion matrices have distinct eigenvalues. Proof. This theorem is proved using induction on the dimension of V . Some details will be left to the reader and such will be noted. Assume dim(V ) = n. If V is cyclic, we are done. Assume otherwise. Let Cx1 = span{x1 , T x1 , . . . , T m−1 x1 } be a cyclic subspace of maximal dimension m < n. The goal is to show that V is the direct sum of Cx1 with a T -invariant subspace (then we could repeat the argument for TCx1 and apply an inductive-type argument on the dimension). Choose a linear functional f : V → F such that f (T k x1 ) = 0 f (T

m−1

for all 0 ≤ k < m − 1,

x1 ) = 1.

Define K : V → Fm by K(x) = (f (x), f (T x), . . . , f (T m−1 x)). Define B = {x1 , T x1 , . . . , T m−1 x1 }. Then K|Cx1 : Cx1 → Fm is an isomorphism since   0 0 ··· 1  .. ..   . . ... ?    [K]S,B =  ..  . . .  0 1 . .  1 ? ··· ? We now show ker(K) is T -invariant. Fix x ∈ ker K. This means f (T k x) = 0 for all 1 ≤ k < m. By the definition of m, T m x is a linear combination of {x, T x, . . . , T m−1 x}. It follows that T x ∈ ker(K). It’s not too hard to see ker(K) ∩ Cx1 = {0}. We then use the Rank-Nullity Theorem (Theorem 3.3.2) to conclude that V = Cx1 ⊕ker(K) (since dim(Cx1 ) = dim(Im(K))). The rest of the claims concerning eigenvalues follow from Proposition 5.5.6.

50

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Remark 5.5.11. The reader should make sure not to confuse the previous theorem with a mummy. For the theorem shows that one can decompose a finite-dimensional vector space into direct sum parts while a mummy is a sum of decomposing parts. 5.6. Rational canonical form. In this section, we introduce the first canonical form of matrix representations: the rational canonical form. One advantage of this canonical form over the Jordan canonical form is that we do not need to assume that F is algebraically closed. The form we will be interested in is referred to as the Frobenius canonical form in Petersen’s notes ([6], Chapter 2, Section 7). This differs from the Smith normal form of a matrix. Per Daniel Carmody, the Smith normal form is always diagonal; in particular, a square matrix need not be similar to its Smith normal form. This is called a rational canonical form because the rational canonical form of a matrix with rational entries will be another matrix with rational entries. This form should not referred to that other form for matrices, the one that’s much crazier in appearance, the ... irrational canonical form. We continue to assume that all vector spaces are finite-dimensional. The field F can be any field. Before we begin the proof of the rational canonical form, we recall the following definition. Definition 5.6.1. We say that a matrix A ∈ Mn (F) is similar to a matrix B ∈ Mn (F) if there exists an invertible matrix S ∈ GLn (F) such that A = SBS −1 . Note that two matrices are similar if and only if they are matrix representations of the same linear operator, but with respect to possibly different bases. Recall Theorem 5.5.10, which states that every vector space can be written as the direct sum of cyclic subspaces. Combining bases for the cyclic subspaces, this with Lemma 5.5.3 give us that each linear operator has a matrix representation of the form   Ap1 0 · · · 0  0 Ap2 0     .. ..  . . .  . . .  0 0 · · · Apk The rational canonical form goes a bit further, saying that we may assume pi divides pi−1 , for i = 2, . . . , k, and the monic polynomials pi are unique. Theorem 5.6.2. (Rational canonical form) Let T : V → V be a linear operator on a finitedimensional vector space. Then V has a cyclic subspace decomposition such that the matrix representation of T with respect to the induced basis B has the form   Ap1 0 · · · 0  ..   0 Ap2 .   . [T ]B =  . . .. 0   ..  0 · · · 0 Apk Moreover, the monic polynomials pi have the property that pi divides pi−1 for i = 2, . . . , k, and are unique.

SWILA NOTES

51

Proof. This proof will use the proof of the cyclic subspace decomposition of vector spaces (Theorem 5.5.10). We will show the polynomials obtained in the proof of Theorem 5.5.10 satisfy the required divisibility properties. Let n = dim(V ). Recall that m ≤ n is the largest dimension of a cyclic subspace of V . Recalling Cx is the cyclic subspace generated by each x ∈ V , this means dim(Cx ) ≤ m for all x ∈ V and there exists x1 ∈ V so that {x1 , T x1 , . . . , T m−1 x1 } is linearly independent. In the proof of Theorem 5.5.10, we found a T -invariant complementary subspace M of Cx1 , i.e., T (M ) ⊆ M and Cx1 ⊕ M = V . (If V = Cx1 , just let M = 0.) By the definition of m, there exists a polynomial p1 (t) = tm − am−1 tm−1 − · · · − a0 ∈ F[t] such that (†)

p1 (T )x1 = T m x1 − am−1 T m−1 x1 − · · · − a0 x1 = 0.

We now show that p1 (T )z = 0 for all z ∈ V . I leave the following as an exercise to the reader: • If p1 (T )z = 0 for all z ∈ M , then p1 (t) equals the minimal polynomial mT (t) of T . • p1 (T )x = 0 for all x ∈ Cx1 . It remains to check that p1 (T )z = 0 for all z ∈ M . Recalling the choice of m, for z ∈ M , T m (x1 + z) = bm−1 T m−1 (x1 + z) + · · · + b0 (x1 + z),

some bm−1 , . . . , b0 ∈ F,

which implies by linearity T m (x1 ) + T m (z) = bm−1 T m−1 (x1 ) + · · · + b0 x1 + bm−1 T m−1 (z) + · · · + b0 z (?) =⇒

T m (x1 ) − bm−1 T m−1 (x1 ) − · · · − b0 x1 = −T m (z) + bm−1 T m−1 (z) + · · · + b0 z = 0.

The right equality with 0 follows from the fact that Cx1 ∩ M = {0}. As the left side lies in Cx1 while the right side lies in M , linear independence of {x1 , T x1 , . . . , T m−1 x1 } implies that ai = bi for all i (recall p1 (T )x1 = 0 at (†)). From (?), we may conclude p1 (T )z = 0. If Cx1 = V , existence and uniqueness follow as p1 (t) = mT (t). Thus assume otherwise, i.e., Cx1 ( V . As M is T -invariant, we may choose x2 ∈ M , p2 ∈ F[t], l ∈ N as we chose x1 ∈ V , p1 ∈ F[t], m ∈ N, respectively. As deg(p2 ) = l ≤ m = deg(p1 ) (justification left as exercise to the reader), we may write p1 = q · p2 + r, where deg(r) < deg(p2 ) or r = 0. (I may leave the rest as an exercise to the reader.) Thus 0 = p1 (T )(x2 ) = q(T ) ◦ p2 (T )(x2 ) + r(T )(x2 ) = r(T )(x2 ). Writing r(t) = α0 + α1 t + · · · + αl−1 tl−1 , this means 0 = α0 x2 + α1 T x2 + · · · + αl−1 T l−1 x2 . As x2 , T x2 , . . . , T l−1 (x2 ) are linearly independent, we must have α0 = · · · = αl−1 = 0. This implies r(t) = 0 and p2 divides p1 . We can conclude (by induction on dimension) the existence of the rational canonical form. More precisely, we can choose a T -invariant subspace M 0 ⊂ M such that Cx2 ⊕M 0 = M . Then we would choose a largest dimension cyclic subspace of M 0 and an associated polynomial. This process would have to end by the finite-dimensionality of V and the strictly decreasing dimension of the complementary subspace. (Brief intermission for random thought) Random Thought 5.6.3. Meanwhile on Jeopardy... Contestant: I’ll take “I η π” for 500, Alex. Trebek: In the Greek alphabet, this letter comes after µ.

52

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Contestant: Alex, what is ν? Trebek: Not much, contestant. What is ν with you? (self-satisfied chuckle with barely audible “The jokes just keep getting β and β.”) Contestant: Ω Trabek, I can’t even right now. Proof. (Proof continued, this uniqueness proof probably isn’t correct but I wonder if anyone is actually reading this at this point...) We now prove uniqueness of the polynomials p1 and p2 above. Note that x1 and x2 are not necessarily unique (for example, one could replace x1 and x2 by any nonzero scalar multiples). We have already shown that p1 (t) = mT (t), which proves that p1 is unique. We now prove uniqueness of p2 . Suppose we have Cx1 ⊕ M = V = Cx01 ⊕ M 0 . This gives us two matrix representations for T (choosing bases for M and M 0 ): A1 0 Ap1 0 A1 0 Ap1 0 = and = . 0 A2 0 [T |M ] 0 A02 0 [T |M 0 ] Note that these matrices are similar as they are matrix representations of the same linear operator. Choose a matrix S ∈ GLn (F) such that A1 0 A1 0 −1 =S S. 0 A2 0 A02 I leave as an exercise to the reader to show that for all p(t) ∈ F[t], p(A1 ) 0 p(A1 ) 0 −1 =S S. 0 p(A2 ) 0 p(A02 ) Hint: Note that

p(A1 ) 0 0 p(A2 )

In particular, the matrices p(A1 ) 0 0 p(A2 )

=p

and

A1 0 0 A2

.

p(A1 ) 0 0 p(A02 )

have the same rank for all p(t) ∈ F[t]. In particular, by the Rank-Nullity theorem, p(A2 ) = 0

⇐⇒

p(A02 ) = 0.

This shows A2 and A02 share the same minimal polynomial. This shows that the next block in the matrix representation of T (after AmT ) has to be AmT |M . Uniqueness of the rational canonical form follows by induction as existence did. Definition 5.6.4. Fix a linear operator T : V → V on a finite-dimensional vector space. The unique monic polynomials p1 , . . . , pk from Theorem 5.6.2 are called the invariant factors of T . These are referred sometimes to as elementary divisors. Remark 5.6.5. Here, I don’t consider the constant 1 function to be an invariant factor. Some other sources may allow the constant 1 function to be some of the invariant factors.

SWILA NOTES

53

Example 5.6.6. We have not noted how difficult it actually is to compute the rational canonical form of a linear operator. With about 7.5 quintillion (7,500,000,000,000,000,000) grains of sand in the world, could you imagine trying to find the largest grain of sand worldwide? Now imagine trying to find the maximal dimension cyclic subspace in a vector space of dimension 15 quintillion for a general linear operator. That’d take at least a couple days. However, it is (relatively) easy to check if a matrix is in rational canonical form. Let   0 −6 0 0  1 5 0 0   A=  0 0 0 −4  . 0 0 1 4 Ap1 0 , where p1 (t) = t2 − 5t + 6 = (t − 3)(t − 2) and p2 (t) = t2 − 4t + 4 = Note A = 0 Ap2 (t − 2)2 . A is not in rational canonical form since p2 does not divide p1 . On the other hand, let   0 0 −1 0  1 0 1 0   B=  0 1 1 0 . 0 0 0 1 Aq1 0 Note that B = , where q1 (t) = t3 −t2 −t+1 = (t−1)2 (t+1) and q2 (t) = (t−1). 0 Aq2 B is in rational canonical form since q2 divides q1 . The invariant factors of B are q1 and q2 . The minimal polynomial of B is q1 = (t − 1)2 (t + 1), and the characteristic polynomial of B is q1 · q2 = (t − 1)3 (t + 1). We have a corollary of the rational canonical form. Corollary 5.6.7. If two linear operators on an n-dimensional vector space have the same minimal polynomials of degree n, then they have the same rational canonical form and are similar. Proof. Fix a linear operator T : V → V . If mT (t) has degree equal to dim(V ), then the rational canonical form is AmT . The corollary follows. The existence and uniqueness of the rational canonical form gives us the following proposition. Proposition 5.6.8. For matrices A, B ∈ Mn (F), the following are equivalent: (1) A is similar to B. (2) A and B have the same invariant factors. (3) A and B have the same rational canonical form. Example 5.6.9. This is only somewhat of an example and is meant to improve intuition about the rational canonical form. Suppose Aq1 0 A= , 0 Aq2

54

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

where q1 , q2 are relatively prime. For p ∈ F[t], if p(Aq1 ) 0 0 = p(A) = , 0 p(Aq2 ) then p divides q1 and q2 . This implies q1 · q2 divides mA . On the other hand, we clearly have (q1 · q2 )(A) = 0. This implies mA = q1 · q2 . This implies the rational canonical form of A is Aq1 ·q2 . One can show that for any block diagonal matrix A with each of the blocks the companion matrix for a monic polynomial pi , the first block of the rational canonical form of A will be the companion matrix of the least common multiple of the pi . 5.7. The Jordan canonical form. What starts with a “J” and ends with “ordan canonical form”? I’m not sure if you were able to get it, but it’s Jordan canonical form. All of these notes thus far have led up to this one section: the Jordan canonical form. Enjoy. We continue to assume that vector spaces are finite-dimensional of dimension n. We will assume the field of scalars is C. The Fundamental Theorem of Algebra then implies that the minimal polynomial of any linear operator T : V → V splits into the product of linear factors. Lemma 5.7.1. Let T : V → V be a linear operator with minimal polynomial mT (t) = p(t)q(t), where the greatest common divisor of p and q is 1. Then V = ker(p(T )) ⊕ ker(q(T )) and mT |ker p(T ) = p and mT |ker q(T ) = q. Proof. We need to show that ker(p(T ))∩ker(q(T )) = {0}. Suppose x ∈ ker(p(T ))∩ker(q(T )). By Proposition 5.1.3, there are polynomials a(t), b(t) ∈ C[t] such that 1 = a(t)p(t) + b(t)q(t). Then x = a(T )p(T )x + b(T )q(T )x = 0, which implies ker(p(T )) + ker(q(T )) = ker(p(T )) ⊕ ker(q(T )). Moreover, for any v ∈ V , v = a(T )p(T )v + b(T )q(T )v. We see a(T )p(T )v ∈ ker(q(T )) and b(T )q(T )v ∈ ker p(T ). It follows that V = ker(p(T )) ⊕ ker(q(T )). To see ker mT |ker(p(T )) = p, suppose for contradiction there is a polynomial p˜ of degree less than that of p and such that mT |ker p(T ) = p˜. Then, p˜ · q is a polynomial of degree less than mT with p˜(T )q(T )(y) = 0 for all y ∈ V . This is a contradiction. A similar argument works to show mT |ker q(T ) = q. Recall dim(V ) = n. We can now prove the following theorem. Theorem 5.7.2. (The Jordan-Chevalley decomposition) Let T : V → V be a linear operator. Then T = S + N , where S is diagonalizable, N n = 0, and SN = N S. Proof. By the Fundamental Theorem of Algebra, we can write mT (t) = (t − λ1 )m1 · · · (t − λk )mk ,

distinct λ1 , . . . , λk ∈ C.

It follows from Lemma 5.7.1 (by induction) that V = ker(T − λ1 IV )m1 ⊕ · · · ⊕ ker(T − λk IV )mk .

SWILA NOTES

55

As ker(T − λj 1V )mj is T -invariant for each j, we can then apply the rational canonical form from Section 5.6 to each summand. More specifically, we can decompose each summand as the direct sum of cyclic subspaces so that T has a matrix representation of the form   Ap1 0 · · · 0  ..   0 Ap2 .  , [T ] =   ..  ..  . . 0  0 · · · 0 Aps , with pj = (t − λij )rj for some eigenvalue λij and rj ∈ N. We are reduced to proving the statement for companion matrices, where the minimal polynomial has one root. (This corresponds to finding a new basis for each of the cyclic subspaces above and then combining them to obtain a new basis for V .) Let Ap be a companion matrix with p(t) = (t − λ)r . Construct the matrix A=D+N    λ 0 ··· 0   0 λ ··· 0      + = . . . . . ...     .. ..  0 0 ··· λ   λ 1 ··· 0  .   0 λ . . . ..   =  .. .. . .   . . . 1  0 0 ··· λ

1 0 ··· 0 1 ··· .. .. . . 0 0 0 ··· 0 0 0 ···

0 0 .. .

0 0 .. .



     1  0

The minimal polynomial of A is mA (t) = mAp (t) = p(t). By Corollary 5.6.7 (minimal polynomial of A has degree equal to its number of rows), the rational canonical form of A is Ap , and Ap is similar to A. The exercise follows for this choice of D and N . Random Thought 5.7.3. A girl was playing hide-and-go-seek with a spider. Why did she get out? Because it spied her! The same girl then played hide-and-go-seek with Spiderman. Why did she get out this time? Because he spied her, man! The proof of this theorem provides us with the following corollary:

56

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Corollary 5.7.4. Let Ap be a companion matrix with p(t) = (t − λ)n . Then Ap is similar to a Jordan block   λ 1 0 ··· 0 0  0 λ 1 ··· 0 0     .. ..  . .  0 0 λ . . .  .   .. .. . . . . ..   . . . .  .    0 0 0 ··· λ 1  0 0 0 ··· 0 λ Note that the diagonal entries of a Jordan block are all the same. From the rational canonical form, we obtain the Jordan canonical form: Theorem 5.7.5. (Jordan canonical form) Let T : V → V be a linear operator. Then we can find T -invariant subspaces M1 , . . . , Ms such that V = M1 ⊕ · · · ⊕ Ms and each T |Mi has a matrix representation of the form   λi 1 0 · · · 0 0  0 λi 1 · · · 0 0     ..  . .  0 0 λi . .  , Ji =   .. .. . . . . . . . ..    . .    0 0 0 · · · λi 1  0 0 0 · · · 0 λi where λi is an eigenvalue for T . Hence, T has a matrix representation of the form   J1 0 · · · 0  ..   0 J2 .  .  [T ] =  .  . . .   . . 0 · · · 0 Js Moreover, the Jordan canonical form is unique up to the order of the Jordan blocks. Proof. Recall that we are assuming F = C. Let λ1 , . . . , λk be the eigenvalues of T . As noted in the proof of Theorem 5.7.2, V = ker(T − λ1 IV )m1 ⊕ · · · ⊕ ker(T − λk IV )mk . Writing each of the summands as the direct for which  Ap1   0 [T ]B˜ =   ..  . 0

sum of cyclic subspaces, there exists a basis B˜ 0 Ap2 ···

 0 ..  .    .. . 0  0 Aps

···

SWILA NOTES

57

with pi (t) = (t − λji )ni . Applying Corollary 5.7.4 to each Api , there exists a basis B for which   J1 0 · · · 0  ..   0 J2 .   [T ]B =    .. ..  . . 0  0 · · · 0 Js for some Jordan blocks Ji .

Random Thought 5.7.6. What do cars use to listen? Engine-ears!!! Corollary 5.7.7. Suppose dim(V ) = 2 and T : V → V is a linear operator. Either T is diagonalizable, or T is not diagonalizable and there is a basis for which the the associated matrix representation for T is λ 1 [T ] = . 0 λ Corollary 5.7.8. Suppose dim(V ) = 3 and T : V → V is a linear operator. There are three disjoint cases: (1) T is diagonalizable. (2) T is not diagonalizable and there is a basis for which   λ1 0 0 [T ] =  0 λ2 1  . 0 0 λ2 (3) T is not diagonalizable and there is a basis for which   λ 1 0 [T ] =  0 λ 1  . 0 0 λ Remark 5.7.9. Consider Corollary 5.7.8. If the algebraic multiplicity of an eigenvalue exceeds its geometric mulitplicity, then we are in case (2) or (3); if moreover, there are two eigenvalues, then we must be in case (2). If the geometric multiplicity of an eigenvalue is at least two, we must be in either case (1) or (2). 5.8. Computing the Jordan canonical form. The goal of this section is to give a brief overview of computing the Jordan canonical form. To see a well-explained specific example, I’d recommend watching one of Math Doctor Bob’s youtube videos [4]. We now compare the minimal polynomial with the characteristic polynomial. We recall the following lemma, which follows from Proposition 5.2.4. Lemma 5.8.1. Let T : V → V be a linear operator, and fix λ ∈ C. Assume λ is not a root of χT . Then T − λI is invertible. Proposition 5.8.2. Let T : V → V be a linear operator. Then mT (t) = (t − λ1 )m1 · · · (t − λk )mk for some distinct eigenvalues λ1 , . . . , λk of T , and V = ker(T − λ1 IV )m1 ⊕ · · · ⊕ ker(T − λk IV )mk . Moreover, if λ is a root of χT , then λ is a root of mT .

58

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Proof. The first part of the lemma was observed in Theorem 5.7.2. If λ is a root of χT , then λ is an eigenvalue of T . Fix a (nonzero) eigenvector x corresponding to λ. Note ker(T − λIV ) is an p(T )-invariant subspace for all p(t) ∈ F[t]. If λ is not a root of mT , then by Lemma 5.8.1 mT (T )(x) = (T − λ1 )m1 · · · (T − λk )mk (x) 6= 0. This is a contradiction to mT (T ) = 0.

The following will be stated without proof. In the proposition, note mi ≤ ni for each i since mT (t) divides χT (t) (by the Cayley-Hamilton theorem, Theorem 5.5.8). Proposition 5.8.3. Let T : V → V be a linear operator such that χT (t) = (t − λ1 )n1 · · · (t − λk )nk , mT (t) = (t − λ1 )m1 · · · (t − λk )mk , for distinct λ1 , . . . , λk ∈ C. Then ni equals the number of times λi appears on the diagonal of the Jordan canonical form of T , and mi equals the size of the largest Jordan block that has λi on the diagonal. Random Thought 5.8.4. Did you hear this new craze of physicists posting videos of themselves calculating forces and radii and multiplying them together? I think they call it... torqueing? Example 5.8.5. In this example, we will calculate the Jordan canonical form of   2 0 0 A =  0 3 −1  . 0 1 1 Moreover, we will find an invertible matrix S such that SAS −1 is in Jordan canonical form. We first calculate the characteristic polynomial of A:   2−t 0 0 3 − t −1  = (2 − t)[(3 − t)(1 − t) + 1] = −(t − 2)3 . χA (t) = det  0 0 1 1−t Note 

 0 0 0 A − 2I =  0 1 −1  , 0 1 −1 and hence ker(A − 2I) = span{(1, 0, 0), (0, 1, 1)}. So, 2 has algebraic multiplicity 3 and geometric multiplicity 2. This implies the Jordan canonical form of A is   2 0 0  0 2 1 , 0 0 2 unique up to the order of the two blocks. This implies the minimal polynomial of A is mA (t) = (t − 2)2 . One can check (A − 2I)2 = 0.

SWILA NOTES

Now note

59



   0 0    1 . (A − 2I) 1 = 0 1

Hence, 

     0 0 0 A 1  =  1  + 2 1 . 0 1 0 Thus, if       0 0   1      0 , 1 , 1  , B=  0 1 0  then 

 2 0 0 [A]B =  0 2 1  0 0 2

−1    1 0 0 2 0 0 1 0 0 A =  0 1 1   0 2 1  0 1 1 . 0 1 0 0 0 2 0 1 0 

and

For a more difficult example involving a 4 × 4-matrix, I’d recommend looking at one of MathDoctorBob’s Youtube videos on Jordan canonical form [4].

60

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

6. Inner product spaces? ... More like winner product spaces! Actual conversation with Hadrian Quan... Me: Hey, I thought of a title for my chapter about inner product spaces. Hadrian: Oh, what is that? Me: Oh, it’s a space with an inner product. I begin this chapter by briefly describing the origin of the study of inner product spaces... Very few know that the theory of inner product spaces originated in Hamburg, Germany during the late 19th century. While the sandwich idea of a hamburger did not come about until later, cooking meat patties with spices became very popular in the large German town. Hamburg cows were being killed at an unprecedented rate as a result of the increased rise in demand. Butchers would typically only use certain parts of the cow, then discard materials like the intestines and vital organs. Consequently, the sewage system became overwhelmed and waste began flowing into the neighboring Baltic Sea. This resulted in an algal bloom, and the sea became so bright, so vivid that the color became what we now call Pythagreen. The city desperately needed to do something because the increased algae production was decimating the native sea life. How one wanted to solve this issue, however, quickly became one of the most polarizing identities. Some proposed that they should stop selling hamburger patties, but there was ardent backlash due to its popularity. Others argued that the environmental issue would figure itself out after the sea acclimated to the change, but the city’s scientists dismissed this as illogical. Finally, one group decided that it would be best to try to retrofit the city’s system of aquaducts to become an ultrafilter. They figured out a way to slow down the flow of sewage to a rate at which the Baltic Sea could safely handle. While this group was once closed under addition, they quickly became popular because their simple signup sheet consisted of two equally spaced lines. This symmetric bilinear form allowed them to easily spread their message, which helped them be successful. This group was named Innards Pro-ducts pace, and they eventually adapted their ideas to invent what we know as inner product spaces. We will assume F = R or F = C throughout this chapter. Which one F is (or if it matters) should hopefully be clear from context. 6.1. Inner products. We would like to have more structure on our vector spaces. Recall that we can only consider finite sums of vectors as we have no concept of distance. In this section, we will consider vector spaces for which we can consider lengths of vectors and angles between pairs of vectors. Recall given x + iy ∈ C, we define the conjugate a + ib := a − ib. We view R ⊂ C by identifying R with the real axis of C. In particular, a = a ¯ for a ∈ R. Definition 6.1.1. Let V be a vector space over F. An inner product is a map h·, ·i : V ×V → F satisfying the following three properties: For all x, y ∈ V , (1) (Non-degeneracy) ||x||2 := hx, xi > 0 if x 6= 0. (2) (Complex-symmetry) hx, yi = hy, xi. (3) (Linearity in the first component) The map z 7→ hz, yi : V → F is linear, i.e., hαz1 + z2 , yi = αhz1 , yi + hz2 , yi for all z1 , z2 ∈ V, α ∈ F.

SWILA NOTES

61

We call (V, h·, ·i) a real inner product space (complex inner product space) if F = R (if F = C). p For x ∈ V , we define ||x|| = hx, xi ≥ 0 to be the norm of x. We will simply say inner product space and drop the mention of the scalars if the discussion works for both R and C. Remark 6.1.2. In the case F = R, note complex-symmetry is the same as symmetry since a = a ¯ for all a ∈ R. Note that ||0|| = 0 from linearity in the first component. Finally, hx, αyi = α ¯ hx, yi and ||αx||2 = |α|2 ||x||2 for all a ∈ C, x, y ∈ V . Example 6.1.3. Rn becomes a real inner product space with the inner product defined by n X h(x1 , . . . , xn ), (y1 , . . . , yn )i := xj yj = ( x1 · · · xn )( y1 · · · yn )t . j=1

Observe that given two nonzero vectors x, y, the angle θ between x and y is given by hx, yi . θ= ||x|| · ||y|| Moreover, two vectors x,y are orthogonal if and only if hx, yi = 0. Example 6.1.4. Cn is a complex inner product space with the complex inner product defined by n X h(w1 , . . . , wn ), (z1 , . . . , zn )i := wj zj = (w1 , . . . , wn )(z1 , . . . , zn )∗ . j=1

Example 6.1.5. (Inner product on square matrices) Given a matrix A = (aij ) ∈ Mn (C), recall that we define the trace of A as n X tr(A) := aii . i=1

We can define an inner product on the matrices Mn (C) by hA, Bi := tr(B ∗ A). It is clear that this inner product is linear in the first component. By Lemma 2.5.2, hA, Bi = tr(B ∗ A) = tr((B ∗ A)∗ ) = tr(A∗ B) = hB, Ai. It remains to show tr(A∗ A) > 0 for all 0 6= A ∈ Mn (C). It’s nontrivial to show this at this juncture, so we will prove this after proving the Spectral Theorem for self-adjoint operators (see Remark 7.4.8). The same proof shows that the trace is also an inner product on Mn (R). Example 6.1.6. The space of complex-valued continuous functions C([0, 1]) becomes an inner product space when endowed with the inner product Z 1 hf, gi := f (x)g(x) dx. 0

However, this is not an inner product on the space of integrable functions L1 ([0, 1]) (why?). Example 6.1.7. The vector space l2 (read ”little el two”) of square-summable complex sequences becomes an inner product space with the inner product ∞ X h(aj ), (bj )i := aj b j . j=1

62

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

This is one of the most important examples of inner product spaces because it forms a Hilbert space. Unfortunately, we will not have enough time to talk about Hilbert spaces in this workshop. Fortunately, all finite-dimensional inner product spaces are Hilbert spaces. A sometimes useful property of inner product spaces is the parallelogram law, which follows from linearity. Proposition 6.1.8. (Parallelogram law) Let V be an inner product space. Then for all x, y ∈ V , ||x + y||2 + ||x − y||2 = 2(||x||2 + ||y||2 ). The two most important properties of inner product spaces are the Cauchy-Schwarz inequality and the triangle inequality. We will prove those here. Proposition 6.1.9. (Cauchy-Schwarz inequality) Let V be an inner product space. For all x, y ∈ V , |hx, yi| ≤ ||x|| · ||y||. Proof. This is the easiest to remember proof I know. We will assume F = C as the result for F = R follows (or by the same proof). We may assume y is nonzero, or hx, yi = 0 by linearity. For all z ∈ C, 0 ≤ ||x + zy||2 = hx + zy, x + zyi = ||x||2 + hzy, xi + hx, zyi + ||zy||2 (complex-symmetry + linearity)

= ||x||2 + 2Re(zhy, xi) + |z|2 ||y||2 .

As hy, xi ∈ C, we may write hy, xi = |hx, yi|eiθ ,

r ≥ 0, θ ∈ R.

Write z = se−iθ for some s ≥ 0. Then 0 ≤ ||x||2 + 2s|hx, yi| + s2 ||y||2 . The right side is the equation of a U -shaped parabola, which has its minimum at |hx, yi| s := − . ||y||2 At this value of s, |hx, yi|2 0 ≤ ||x||2 + 2s|hx, yi| + s2 ||y||2 = ||x||2 − . ||y||2 Rearranging terms, the Cauchy-Schwarz inequality follows.

Remark 6.1.10. I have a few remarks about the above proof. We will often assume F = C when working with inner product spaces. The proof in the case F = R often follows since we view R ⊆ C after identifying R with the real axis of C. Using bigger words, we can often reduce to the case F = C as R is isometrically isomorphic to a subspace of C. On another note, observe that we used the assumption y 6= 0 to assure that we weren’t dividing by zero in defining s. For the proof, we could have assumed x 6= 0 as well, but I didn’t want to confuse readers in figuring out where we used that x is nonzero. Typically in math proofs, it is very useful to simplify things by making certain reducing assumptions. For example, by using the polar decomposition of complex numbers, we reduced our proof to the case we had a real inner product space.

SWILA NOTES

63

We can now prove the triangle inequality for inner product spaces. Corollary 6.1.11. (Triangle inequality) Let V be an inner product space. For all x, y ∈ V , ||x + y|| ≤ ||x|| + ||y||. Proof. For all x, y ∈ V , ||x + y||2 = ||x||2 + 2Rehx, yi + ||y||2 ≤ ||x||2 + 2|hx, yi| + ||y||2 ≤ ||x||2 + 2||x|| · ||y|| + ||y||2

(Cauchy-Schwarz inequality)

= (||x|| + ||y||)2 . Remark 6.1.12. The triangle inequality implies that an inner product space is actually a normed space endowed with the norm || · ||.Unfortunately, we will not have time to discuss normed spaces. 6.2. Angles and orthogonality in inner product spaces. In high school geometry, we learned that we can figure out the lengths of sides of triangles using angles. For example, there are the Law of Sines and the Law of Cosines. A powerful tool of inner product spaces (which differentiates them from more general normed spaces) is that we can discuss angles. Basically, given two sides of a triangle, we can figure out the third side. This is a very useful tool when discussing limits and estimating. Definition 6.2.1. Let V be an inner product space. We define the angle between two nonzero vectors x, y ∈ V to be hx, yi . ||x|| · ||y|| We say that vectors x, y ∈ V (possibly zero) are orthogonal if hx, yi = 0. We say that a subset S ⊆ V is orthogonal if any two distinct vectors in S are orthogonal. Remark 6.2.2. Observe that the zero vector is orthogonal to every other vector in an inner product space. In terms of equivalence relations, note that declaring x∼y

⇐⇒

“x is orthogonal to y”

satisfies symmetry, but neither reflexivity nor transitivity in general. Definition 6.2.3. The projection of a vector x in the direction of a nonzero vector y is given by y y projy (x) = x, . ||y|| ||y|| Observe that two vectors x, y in an inner product space are orthogonal if and only if ||x + y||2 = ||x||2 + ||y||2 . Many results hold for real and complex inner product spaces; the following frequently used theorem is an example of such.

64

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Theorem 6.2.4. (Pythagorean theorem) Let V be an inner product space and suppose x1 , . . . , xn ∈ V are orthogonal. Then ||

n X k=1

xk ||2 =

n X

||xk ||2 .

k=1

Proof. Induction on k, jk induct on n. You always induct on n. ALWAYS. -DJung

6.3. Orthonormal bases for finite-dimensional inner product spaces. Throughout this section, we will assume V is a finite-dimensional inner product space. We will show in this section that each of these spaces is “pretty much” the same as some inner product space we already know and love. Recall the definition of the Kronecker delta 1, if i = j δij := 0, if i 6= j. Definition 6.3.1. Let V be a finite-dimensional inner product space. We say a collection of vectors {v1 , . . . , vn } ⊆ V is orthonormal if hvi , vj i = δij for all i, j. We say a basis of V is an orthonormal basis if it is orthonormal. Example 6.3.2. An orthonormal basis for Cn is the standard basis S. If n ≥ 3, another orthonormal basis for Cn would be p p p p {e3 , e4 , . . . , en } ∪ {( 1/2, 1/2, 0, . . . 0), ( 1/2, − 1/2, 0, . . . , 0)}. In particular, orthonormal bases are not unique in general. Remark 6.3.3. It becomes a bit more difficult to talk about orthonormal bases for infinitedimensional vector spaces. One could use an analogous definition, allowing our bases to be infinite. However, a more useful definition is allowing for “linear combinations involving infinite series.” We are allowed to do this because the inner product allows us to talk about limits of linear combinations. Proposition 6.3.4. (The Gram-Schmidt process) Given a collection of vectors {w1 , . . . , wm }, at least one nonzero, there is an orthonormal set of vectors {v1 , . . . , vn }, n ≤ m, such that span{w1 , . . . , wm } = span{v1 , . . . , vn }. The proof is recursive; it involves projecting a new vector onto the span of the previously chosen vectors and then taking a difference. A proof can be found in Petersen’s notes [6] on pages 168-169. We obtain the existence of orthonormal bases as a consequence. Corollary 6.3.5. (Existence of orthonormal bases in finite-dimensions) Every finite-dimensional inner product space has an orthonormal basis. Proof. Apply the Gram-Schmidt process to a basis for the space.

Random Thought 6.3.6. This adorable little grandma, Mrs. Lee, discovered that you could save short videos in the form of gifs. Hearing this, her grandson Frank suggested she go by the river to record some of the beavers at work. She responded, “Frank Lee, my dear, I don’t gif a dam.”

SWILA NOTES

65

Proposition 6.3.7. Suppose V is an inner product space with an orthonormal basis {v1 , . . . , vn }. Then each x ∈ V can be (uniquely) expanded as x = hx, v1 iv1 + · · · + hx, vn ivn . Proof. Fix x ∈ V . As {v1 , . . . , vn } is a basis for V , we may write x = a1 v1 + · · · + an vn . Using the properties of the inner product, we can then calculate n X hx, vi i = aj δij = ai . j=1

The uniqueness just comes from recalling that given any basis (not necessarily orthonormal), every element can be uniquely written as a linear combination of the basis vectors (see Section 1.4). The next corollary follows immediately. Corollary 6.3.8. Let T : V → V be a linear operator and {v1 , . . . , vn } an orthonormal basis for V . Then for all x ∈ V , T (x) = hT (x), v1 iv1 + · · · + hT (x), vn ivn . Recall that given metric spaces X, Y , an isometry between X and Y is a function f : X → Y satisfying dX (a, b) = dY (f (a), f (b)) for all a, b ∈ X. This motivates the following definition. Definition 6.3.9. Let V, W be inner product spaces. An isomorphism T : V → W is called an isometric isomorphism if hT x, T yiW = hx, yiV for all x, y ∈ V . We say vector spaces V and W are isometrically isomorphic if there exists an isometric isomorphism between them. Remark 6.3.10. (Aside about my own studies) When dealing with general vector spaces, we were interested in analyzing isomorphisms. In this section, when the inner product endows us with more structure, we will be interested in isometric isomorphisms. In Differentiable Manifolds (Math 518), one is interested in smooth bijections with smooth inverse since one has a “smooth structure” on the manifold. For example, the plane R2 seems like a pretty nice place to travel on (or a Stars Wars-themed mode of transportation). In metric spaces, one is interested in studying isometries. However, sometimes it’s a bit much to assume that there any many isometries between two metric spaces. Thus, we often settle to study the larger space of Lipschitz functions. We say a function f : X → Y is L-Lipschitz if dY (f (a), f (b)) ≤ L · dX (a, b)

for all a, b ∈ X.

Note isometries are 1-Lipschitz while differentiable functions f : R → R with unbounded derivative are not L-Lipschitz for any constant L. I study Sobolev spaces on metric spaces, which would be interesting to those who really enjoy measure theory and metric spaces. For a brief introduction, say M and N are nice enough spaces (for example, M = Rm and N = Rn ) so that we can make sense of integrals and gradients of functions. We say f : M → N is a Sobolev function if f and its gradient Of are both integrable. I am interested in studying whether Lipschitz functions are dense in certain Sobolev spaces (after defining some notion of distance in Sobolev spaces). It turns out that this problem has a lot to do with algebraic topology, differential geometry, and

66

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

geometric measure theory. More specifically, the density of Lipschitz functions often depends on Lipschitz homotopy groups (which are the same as normal homotopy groups, except one requires the functions and homotopies to be Lipschitz as opposed to just continuous) and the smooth structures of the domain and target space. A related problem is studying whether smooth functions are dense in a Sobolev space; this is in some cases related to whether certain homotopy groups of the target space are trivial and whether one can extend functions from skeletons of the CW complex to the whole space. See various works of Professor Piotr Haljasz and Professor Jeremy Tyson for more information on density of certain spaces in Sobolev spaces. See PDE’s for an introduction to Sobolev spaces on Rn . See Hajlasz’s paper “Sobolev spaces on metric-measure spaces” for an introduction to defining Sobolev spaces between metric spaces when one doesn’t necessarily have a Euclidean structure (one can use path families). We leave the following lemma as a computation to the reader. Lemma 6.3.11. (Polarization identities) Fix an inner product space V and x, y ∈ V . If F = R, 1 hx, yi = ||x + y||2 − ||x − y||2 . 4 If F = C, 4

hx, yi =

1X j 1 i ||x + ij y||2 = ||x + y||2 + i||x + iy||2 − ||x − y||2 − i||x − iy||2 . 4 j=1 4

As one can naturally define a metric on inner product spaces using the inner product, the following proposition gives that isometric isomorphisms are indeed isometries. Proposition 6.3.12. Let T : V → W be an isomorphism between F-inner product spaces. Then the following are equivalent: (1) T is an isometric isomorphism; (2) ||T x||W = ||x||V for all x ∈ V ; (3) ||T x − T y||W = ||x − y||V for all x, y ∈ V . Proof. (2) ⇔ (3) is clear by linearity of T and letting y = 0. (1) ⇒ (2) follows by letting x = y. Finally, (2) ⇒ (1) follows from the Polarization identities by writing things out. Recall we view Fn as an inner product space with the inner product defined in Example 6.1.3 or Example 6.1.4. Proposition 6.3.13. Let V be an inner product space of dimension n. Then V is isometrically isomorphic to Fn . Proof. By Corollary 6.3.5, there exists an orthonormal basis {v1 , . . . , vn } of V . Define the linear transformation T : V → Fn by x 7→ (hx, v1 i, . . . , hx, vn i). By Proposition 6.3.7, this is an isomorphism with inverse (a1 , . . . , an ) 7→ a1 v1 + · · · + an vn .

SWILA NOTES

67

Moreover, for all x ∈ V , ||x||2 = ||hx, v1 iv1 + · · · + hx, vn ivn ||2

(by Lemma 6.3.7)

= ||hx, v1 iv1 ||2 + · · · + ||hx, vn ivn ||2

(by the Pythagorean Theorem)

= |hx, v1 i|2 + · · · + |hx, vn i|2 = ||T x||2 . By the previous lemma, it follows that V is isometrically isomorphic to Fn .

6.4. Orthogonal complements and projections. Recall in Section 3.2 we defined complements of subspaces and projections. Given a subspace M ⊆ V , a complement of M is a subspace N ⊆ V such that V = M ⊕ N . A linear operator E : V → V is called a projection if E 2 = E. In this section, we will take a more refined look at them in the context of inner product spaces. This perspective wasn’t available to us before because we didn’t have the power of an inner product on our vector spaces to be able to talk about orthogonality. Throughout this section, we will assume V is a finite-dimensional inner product space and F = R or C. Definition 6.4.1. Given a subspace M ⊆ V , fix an orthonormal basis {v1 , . . . , vm } for M . We define the orthogonal projection onto M to be the linear operator projM : V → V defined by projM (x) = hx, v1 iv1 + · · · + hx, vm ivm . Remark 6.4.2. It’s a fact that the projection onto M is well-defined, i.e., independent of the orthonormal basis chosen for M . Moreover, by Proposition 6.3.7, projM (x) ∈ M for all x ∈ M . This implies projM is a projection, i.e., proj2M = projM . Definition 6.4.3. Fix a subspace M ⊆ V . We define the orthogonal complement of M to be M ⊥ := {x ∈ V : hx, yi = 0 for all y ∈ M }. Proposition 6.4.4. Fix a subspace M ⊆ V . Then ker(projM ) = M ⊥ and V = M ⊕ M ⊥ . In particular, (M ⊥ )⊥ = M and dim(V ) = dim(M ) + dim(M ⊥ ). Proof. Fix an orthonormal basis {v1 , . . . , vm } for M . For x ∈ M , observe projM (x) = 0

⇐⇒

hx, vj i = 0,

for all j = 1, . . . , m.

(This is due to the linear independence of {v1 , . . . , vm }.) Note that these are further equivalent to hx, yi = 0 for all y ∈ M, using the linearity of the inner product and that {v1 , . . . , vm } is a basis for M . Thus, we have x ∈ ker(projM ) ⇐⇒ x ∈ M ⊥, which proves ker(projM ) = M ⊥ . It isn’t too hard to see Im(projM ) = M . Then Proposition 3.2.8 implies V = M ⊕ M ⊥ . For any x ∈ M , note x ∈ (M ⊥ )⊥ . This implies M ⊆ (M ⊥ )⊥ . On the other hand, since M ⊥ ⊕ (M ⊥ )⊥ = M ⊕ M ⊥ , dim((M ⊥ )⊥ ) = dim(V ) − dim(M ⊥ ) = dim(M ).

68

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

It follows that M = (M ⊥ )⊥ .

Random Thought 6.4.5. Have you heard the one about the towel and the desert? Actually nevermind... It’s pretty dry humor. Proposition 6.4.6. For any x ∈ V and a subspace M ⊆ V , projM (x) is the closest element to x in M in the following sense: For any z ∈ M , ||x − projM (x)|| ≤ ||x − z||. Proof. Fix x ∈ V . Observe x − projM (x) ∈ ker projM = M ⊥ and projM (x) − z ∈ M . Hence, by the Pythagorean Theorem, ||x − z||2 = ||x − projM (x)||2 + ||projM (x) − z||2 ≥ ||x − projM (x)||2 . Theorem 6.4.7. (Bessel’s inequality) Fix a subspace M ⊆ V . If {e1 , . . . , em } is an orthonormal basis for M , then for all x ∈ V , m X

|hx, ej i|2 ≤ ||x||2

j=1

with equality if and only if x ∈ M . Proof. Write x = xM + xM ⊥ for xM ∈ M and xM ⊥ ∈ M ⊥ . As xM = hx, e1 ie1 + · · · + hx, em iem , 2

2

2

2

||x|| = ||xM || + ||xM ⊥ || ≥ ||xM || =

m X

|hx, ej i|2 .

j=1

Note we have equality if and only if xM ⊥ = 0, which is equivalent to x ∈ M .

Remark 6.4.8. This section would take a fair bit longer to write if we allowed for V to be infinite-dimensional. We would have to assume that the subspace M is closed, and then prove that minimizers of distance exist. Then we would be able to prove the existence of the orthogonal complement. Also, a more general statement of Bessel’s inequality holds for infinite-dimensional product spaces and is a vital tool for studying Hilbert spaces. See Terence Tao’s An Epsilon of Room, I: Real Analysis [7] for a more thorough treatment. 6.5. The adjoint of a linear transformation. In this section, we are interested in defining the adjoint of a linear transformation. As always, V and W will always be inner product spaces with the same field of scalars (either F = R or C). It will typically be enough to do the theory for when F = C, so we will often assume F = C. This section will be vital in setting up the next chapter. For motivation, we begin by defining the adjoint of a matrix. Definition 6.5.1. Let A ∈ Ml×m (F). In the case F = R, we define the adjoint of A to be the transpose At ∈ Mm×l (R). In the case F = C, we define the adjoint of A to be A∗ := At ∈ Mm×l (C). Here, if A = (aij ), At = (aji ).

SWILA NOTES

69

Example 6.5.2. If A=

1 0 −i 2 + 3i 2 4i

∈ M2×3 (C),

then 

 1 2 − 3i 2  ∈ M3×2 (C). A∗ =  0 i −4i Remark 6.5.3. Recall the inner product on Fn defined in Examples 6.1.3 or Example 6.1.4. We may naturally identify a vector x ∈ Fn with an (n × 1)-matrix whose elements are the coordinates of x. For any A ∈ Ml×m (F), x ∈ Fm , and y ∈ Fl , hAx, yi = (Ax)t y¯ = xt At y¯ = x(A¯t y) = hx, A∗ yi. As linear transformations between finite-dimensional vector can be identified with matrices (see Proposition 2.2.3), we should be able to define the adjoint A∗ : W → V of a linear transformation A : V → W . The map should satisfy hAx, yiW = hx, A∗ yiV

for all x ∈ V, y ∈ W.

We now prove the existence (and uniqueness) of the adjoint map. Theorem 6.5.4. Given a linear map T : V → W , there exists a unique linear map T ∗ : W → V satisfying hT x, yiW = hx, T ∗ yiV

for all x ∈ V, y ∈ W.

We call T ∗ : W → V the adjoint of T . Proof. Fix an orthonormal basis e1 , . . . , en for V . Fix y ∈ W . By Corollary 6.3.8, we should have T ∗ y = hT ∗ y, e1 iV e1 + hT ∗ y, e2 iV e2 + · · · + hT ∗ y, en iV en Thus, we define T ∗ y = hy, T e1 iW e1 + · · · + hy, T en iW en . It is easy to see T ∗ is linear as the inner product is linear in the first coordinate. Using the orthonormality of {e1 , . . . , en }, hT ej , yiW = hej , T ∗ yiV

for all j = 1, . . . , n and y ∈ W.

By linearity of T and of the inner product, we obtain hT x, yiW = hx, T ∗ yiV

for all x ∈ V, y ∈ W.

For uniqueness, suppose there is another linear transformation S : W → V satisfying hT x, yiW = hx, SyiV

for all x ∈ V, y ∈ W.

By taking a difference, it follows hx, T ∗ y − Syi = 0

for all x ∈ V, y ∈ W.

Letting x = T ∗ y − Sy for y ∈ W , we have T ∗ y = Sy by non-degeneracy. We may conclude the adjoint of T exists and is unique.

70

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Remark 6.5.5. Recall that every linear transformation T : Fm → Fl is of the form T (x) = Ax for some A ∈ Ml×m (F) (see Proposition 2.2.3). By the discussion at the beginning of this section, the adjoint map T ∗ : Fl → Fm is given by T ∗ (y) = A∗ y, where A∗ is the conjugate transpose matrix of A. Random Thought 6.5.6. You have probably noticed that nearly all comedians have really large knees. Now that’s because all of them eat a lot of pho noodles, which causes them to get what is commonly referred to as the “pho knee”. This also explains why most comedians are either funny or phony; it’s due to confusion in pronunciation. The adjoint satisfies a few other important properties (which we leave as an exercise). One uses strongly the uniqueness of the adjoint to prove these properties. Proposition 6.5.7. Let S, T : V → W and T1 : V1 → V2 , T2 : V2 → V3 be linear transformations. Then (1) (S + T )∗ = S ∗ + T ∗ . (2) T ∗∗ = T . ¯ V. (3) (λ1V )∗ = λ1 ∗ (4) (T2 T1 ) = T1∗ T2∗ . (5) If T is invertible, then (T −1 )∗ = (T ∗ )−1 . Theorem 6.5.8. (The Fredholm Alternative) Let T : V → W be a linear transformation. Then (1) ker(T ) = im(T ∗ )⊥ . (2) ker(T ∗ ) = im(T )⊥ . (3) ker(T )⊥ = im(T ∗ ). (4) ker(T ∗ )⊥ = im(T ). Proof. As T ∗∗ = T and M ⊥⊥ = M , all four statements are equivalent. Thus, it suffices to prove the first statement. If x ∈ ker(T ), hx, T ∗ yi = hT (x), yi = 0 for all y ∈ W. If x ∈ im(T ∗ )⊥ , 0 = hx, T ∗ yi = hT (x), yi for all y ∈ W. Letting y = T x, it follows from non-degeneracy that x ∈ ker(T ). This proves ker T = im(T ∗ )⊥ . Corollary 6.5.9. Let T : V → W be a linear map between finite-dimensional inner product spaces. If T is surjective, then T ∗ is injective. If T is injective, then T ∗ is surjective. In particular, if T is an isomorphism, then T ∗ is an isomorphism. Proof. This follows from the Fredholm Alternative and Proposition 6.4.4.

Corollary 6.5.10. Let T : V → W be a linear map between finite-dimensional inner product spaces. Then rank(T ) = rank(T ∗ ). Proof. By the rank-nullity theorem and the Fredholm Alternative, dim(V ) = dim(ker(T )) + dim(im(T )) = dim(im(T ∗ )⊥ ) + dim(im(T )) = dim(V ) − dim(im(T ∗ )) + dim(im(T )).

SWILA NOTES

The exercise follows by the rank-nullity theorem.

71

We conclude this chapter with an interesting corollary. Corollary 6.5.11. Let T : V → V be a linear operator. Then λ is an eigenvalue for T if and ¯ is an eigenvalue for T ∗ . Moreover, their eigenvalue pairs have the same geometric only if λ multiplicity: ¯ V )). dim(ker(T − λ1V )) = dim(ker(T ∗ − λ1 Proof. Note that λ is an eigenvalue of T if and only if dim(ker(T − λ1V )) > 0. By the previous corollary and the Rank-Nullity Theorem, ¯ V )) dim(ker(T − λ1V )) = dim(ker(T ∗ − λ1 for all λ ∈ F. The result follows.

72

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

7. Linear operators on IPS: The love story of a couple linear transformations who respected their own personal (inner product) space Other title contenders for after the colon were “Saving (inner product) space for Hilbert” and “Can keep my hands to myself, by Selena Gomez”. I remember this one time I was doing my homework, finding conjugate transposes of matrices when I decided to be cool and buy some cigarettes. Long story short, I chickened out, got arrested for trying to litter the cigarettes, and only missed out on jail time by doing a hundred miserable hours of community service. And that’s the last time I ever avoided adjoint. In this chapter, we will continue to assume V is a finite-dimensional inner product space and F = R or C unless otherwise noted. 7.1. Self-adjoint maps. In Section 6.5, given a linear operator T : V → W , we constructed another linear operator T ∗ : W → V satisfying hT x, yiW = hx, T ∗ yiV

for all x, y ∈ V.

We called T ∗ the adjoint of T . Definition 7.1.1. We say a linear operator T : V → V is self-adjoint (skew-adjoint) if T ∗ = T (T ∗ = −T ). We say a matrix A ∈ Mn (F) is self-adjoint (skew-adjoint) if A∗ = A (A∗ = −A). 0 −b Example 7.1.2. The matrix is skew-adjoint for any b ∈ R. b 0 a −ib Example 7.1.3. The matrix is self-adjoint for any a, b ∈ R. ib a Example 7.1.4. Given any linear map T : V → W , T T ∗ : W → W and T ∗ T : V → V are self-adjoint. The next lemma and proposition are from Friedberg, Insel, and Spence’s Linear algebra ([1], Corollary to Theorem 6.5 and Theorem 6.10). Lemma 7.1.5. Let T be a linear operator on V and B = {v1 , . . . , vn } an orthonormal basis for V . If A = [T ]B , then Aij = hT vj , vi i. Proof. By Corollary 6.3.8, for each j = 1, . . . , n, T vj = hT vj , v1 iv1 + · · · + hT vj , vn ivn . The lemma follows.

Proposition 7.1.6. Let B = {v1 , . . . , vn } be an orthonormal basis for V and T : V → V a linear operator. Then [T ∗ ]B = [T ]∗B . Proof. Let A = [T ]B and B = [T ∗ ]B . By Lemma 7.1.5, Bij = hT ∗ vi , vj i = hvi , T vj i = hT vj , vi i = Aji . The proposition follows.

SWILA NOTES

73

The following corollary is then immediate by writing things out. Corollary 7.1.7. Let B be an orthonormal basis for V and T : V → V a linear operator. Then T is self-adjoint if and only if the matrix [T ]B is self-adjoint. Remark 7.1.8. This result does not hold if we remove the restriction that B is orthonormal. For example, let V = R2 , F = R, B = {(1, 0), (1, 1)}, and 2 1 T (x) = x. 1 2 We end this section with an important result for self-adjoint and skew-adjoint operators. We introduce the following property of some linear operators. Definition 7.1.9. We say a linear operator T : V → V is reducible if every invariant subspace W ⊆ V has a complementary invariant subspace. In other words, for every subspace W ⊆ V satisfying T (W ) ⊆ W , there exists another subspace W 0 ⊆ V such that T (W 0 ) ⊆ W 0 and V = W 0 ⊕ W . Proposition 7.1.10. All self-adjoint operators and skew-adjiont operators are reducible. More specifically, given a linear operator T : V → V that is self-adjoint or skew-adjoint, and an invariant subspace M ⊂ V , the orthogonal complement M ⊥ is also invariant. Proof. Assume T : V → V is self-adjoint or skew-adjoint. Assume T (M ) ⊂ M . Let x ∈ M and z ∈ M ⊥ . Since T (x) ∈ M , 0 = hz, T (x)i = hT ∗ (z), xi = ±hT z, xi. It follows that T (z) ∈ M ⊥ .

7.2. Isometries. The idea of polarization identities is to rewrite inner products as a linear combinations of norms squared. If we have information about how linear operators behave with norms, we could then deduce how they behave with linear operators. Recall the polarization identities stated in Lemma 6.3.11, which we rewrite here for convenience: Fix an inner product space V and x, y ∈ V . If F = R, hx, yi =

1 ||x + y||2 − ||x − y||2 . 4

If F = C, 4

hx, yi =

1X j 1 i ||x + ij y||2 = ||x + y||2 − ||x − y||2 + i||x + iy||2 − i||x − iy||2 . 4 j=1 4

Note that a linear operator T = 0 if and only if hT (x), yi = 0 for all x, y ∈ V . We can improve this for self-adjoint operators. Proposition 7.2.1. Let T : V → V be self-adjoint. Then T = 0 if and only if hT x, xi = 0 for all x ∈ V . Proof. (⇒) is clear.

74

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Now assume hT x, xi = 0 for all x ∈ V . For any x, y ∈ V , 0 = hT (x + y), x + yi = hT x, xi + hT x, yi + hT y, xi + hT y, yi = 0 + hx, T ∗ yi + hT y, xi + 0 = hx, T yi + hT y, xi

(T is self-adjoint)

= 2RehT y, xi. Letting x = T y, we see T y = 0 by non-degeneracy.

Remark 7.2.2. By a similar proof, you can show that any ring satisfying x2 = x for all x is commutative. One can extend the above proposition to all linear operators when F = C. Proposition 7.2.3. Let T : V → V be a linear operator on a complex inner product space. Then T = 0 if and only if hT x, xi = 0 for all x ∈ V . Proof. Fix x, y ∈ V . One can show 0 = hT (x + y), x + yi = hT x, yi + hT y, xi and 0 = hT (x + iy), x + iyi = −ihT x, yi + ihT y, xi. This implies

1 1 As −i i y = T x.

1 1 −i i

hT x, yi hT y, xi

=

0 0

.

is invertible, we may conclude hT x, yi = 0. The exercise follows letting

Random Thought 7.2.4. There once was a French political party named Lagrange’s Multipliers, which was a group of ardent supporters of the operation of multiplication. It disbanded quickly however, due to some ... division in the party. Theorem 7.2.5. Let T : V → W be a linear map between inner product spaces. Then the following are equivalent: (1) ||T x|| = ||x|| for all x ∈ V . (2) hT x, T yi = hx, yi for all x, y ∈ V . (3) T ∗ T = 1V . (4) T takes orthonormal sets of vectors to orthonormal sets of vectors. Proof. (1) ⇔ (2): This follows from the polarization identities. (1) ⇔ (3): Assume (1) holds. Then hx, xi = hT x, T xi = hT ∗ T x, xi

for all x ∈ V.

By Proposition 7.2.1, T ∗ T = 1V . This proves (1) ⇒ (3). A similar calculation shows (3) ⇒ (1). (2) ⇒ (4): Fix an orthonormal basis B = {v1 , . . . , vn } of V . Then hT vi , T vj i = hvi , vj i = δij .

SWILA NOTES

75

(4) ⇒ (1): Assume (4) holds. Fix x ∈ V with ||x|| = 1. Then we may complete {x} to an orthonormal basis of V by the Gram-Schmidt Process. By (4), we have in particular ||T x|| = 1. By linearity, we may conclude ||T y|| = ||y|| for all y ∈ V . We leave this as an exercise to the reader. Recall that a linear operator T is an isometry if it preserves norms. Corollary 7.2.6. (Characterization of isometries) Let T : V → W be an isomorphism. Then T is an isometry if and only if T ∗ = T −1 . Remark 7.2.7. Endow R2 with the normal Euclidean norm and note the linear transformation T : R → R2 given by T x = (x, 0) satisfies ||T x|| = |x|. By Theorem 7.2.5, T ∗ T = 1R . However, T is clearly not an isomorphism. Thus a converse of the previous corollary does not hold. We end this section by defining orthogonal transformations. Definition 7.2.8. Fix an inner product space V . A linear operator T : V → V is called an orthogonal transformation if hT x, T yi = hx, yi for all x, y ∈ V. Note that Theorem 7.2.5 and Corollary 7.2.6 combine to give several equivalent definitions of orthogonal transformations. 7.3. The orthogonal and unitary groups. Recall the definition of a group (see Definition 2.7.9). We now define two important groups of matrices. Definition 7.3.1. We define the orthogonal group On to be the collection/group of matrices A ∈ Mn (R) satisfying At A = Idn . We define the unitary group Un to be the collection/group of matrices B ∈ Mn (C) satisfying B ∗ B = Id. Remark 7.3.2. Note that if a square matrix A ∈ Mn (F) satisfies A∗ A = Idn , then A is invertible with A∗ = A−1 (see Proposition 2.7.8). Thus, we indeed have that On ⊂ GLn (R) and Un ⊂ GLn (C). However, these are strict containments as orthogonal and unitary matrices have determinant with absolute value 1. Remark 7.3.3. Recall that matrices in Mn (F) may be naturally identified with linear operators of Fn . Thus, we have equivalent defintions of On and Un . The orthogonal group On is the collection of linear maps T : Rn → Rn satisfying T ∗ T = 1Rn . The unitary group Un is the collection of linear maps T : Cn → Cn satisfying T ∗ T = 1Cn . Recall that the columns of a matrix A ∈ Mn (F) are Ae1 , Ae2 , . . . , Aen . The following two propositions follow from Theorem 7.2.5. Proposition 7.3.4. (Characterization of the orthogonal group) Let A ∈ Mn (R). Then the following are equivalent: • A ∈ On . • At A = Id. • |Ax| = |x| for all x ∈ Rn . • The columns of A form an orthonormal basis of Rn . • The rows of A form an orthonormal basis of Rn .

76

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Proposition 7.3.5. (Characterization of the unitary group) Let B ∈ Mn (C). Then the following are equivalent: • B ∈ Un . • B ∗ B = Id. • |By| = |y| for all y ∈ Cn . • The columns of B form an orthonormal basis of Cn . • The rows of B form an orthonormal basis of Cn . 7.4. The spectral theorem for self-adjoint operators. In math, we are often interested in studying maps that have nice forms. For example, we showed in the last chapter that every square complex matrix has an almost unique Jordan canonical form. In this section, we will prove the spectral theorem, which states that every self-adjoint operator has a diagonal matrix representation. We will assume that V is a finite-dimensional complex inner product space (so F = C). The case F = R will follow with similar (and usually simpler) proofs. 0 1 Example 7.4.1. A := is not diagonalizable. Equivalently, if we define T : R2 → 0 0 R2 by T (x) = Ax, there does not a basis {v1 , v2 } of R2 for which [T ]{v1 ,v2 } is diagonal. Lemma 7.4.2. Suppose T : V → V is self-adjoint. If λ is an eigenvalue of T , then λ ∈ R. Proof. Let x 6= 0 be an eigenvector corresponding to λ. Then ¯ xi. λhx, xi = hT x, xi = hx, T xi = λhx, As x 6= 0, it follows that λ is real.

The proof of the following proposition is a bit more involved, so I will omit it. A proof can be found in Petersen’s notes ([6], Theorem 32, pages 205-206). Proposition 7.4.3. (Existence of eigenvalues for self-adjoint operators) Every self-adjoint operator on V has a real eigenvalue. This gives us by induction the spectral theorem for self-adjoint operators. Theorem 7.4.4. (Spectral theorem) Let T : V → V be a self-adjoint operator. Then there exists an orthonormal basis e1 , . . . , en of eigenvectors. Moreover, all of the eigenvalues of T are real. Proof. The second part of the theorem follows from Lemma 7.4.2. By Proposition 7.4.3, there exists an eigenvector e1 of T with corresponding eigenvalue λ1 . By dividing by its norm if necessary, we may assume e1 has norm 1. Consider the subspace {e1 }⊥ = {x ∈ V : hx, e1 i = 0}. We show {e1 }⊥ is T -invariant. For each x ∈ {e1 }⊥ , hT (x), e1 i = hx, T (e1 )i = λ¯1 hx, e1 i = 0. Note by Proposition 6.4.4, with M = span{e1 }, dim{e1 }⊥ = dim(V ) − 1. Thus, we may restrict T to T |{e1 }⊥ and apply induction on the dimension of V to conclude there is an orthonormal basis of eigenvectors for T . For a more constructive argument, we can choose an eigenvector e2 of T |{e1 }⊥ : {e1 }⊥ → {e1 }⊥ , and continue choosing eigenvectors dim(V ) − 2 more times.

SWILA NOTES

77

Recall that a basis of eigenvectors leads us to a diagonal matrix representation (see Section 5.4). Thus, the spectral theorem immediately gives us the following two corollaries. Corollary 7.4.5. Let T : V → V be a self-adjoint operator. Then there exists an orthonormal basis B = {e1 , . . . , en } and a real n × n diagonal matrix D such that [T ]B = D. Here, the k th diagonal entry of D is the eigenvalue corresponding to ek . Definition 7.4.6. We say that a matrix A ∈ Mn (C) is unitarily similar to B ∈ Mn (C) if there exists a unitary matrix U ∈ Un such that A = U BU −1 . Note Corollary 7.4.5 gives that every self-adjoint matrix is unitarily similar to a diagonal matrix: Corollary 7.4.7. Let A ∈ Mn (C) be a self-adjoint matrix. Then there exists an orthonormal basis B = {e1 , . . . , en } and a real n × n diagonal matrix D such that    ∗ | | | | A =  e1 · · · en  D  e1 · · · en  . | | | | Proof. The spectral theorem gives us an orthonormal basis of eigenvectors such that    −1 | | | | A =  e1 · · · en  D  e1 · · · en  . | | | | As {e1 , . . . , en } is an orthonormal basis of Cn , the corollary follows from Proposition 7.3.5 (the matrix with column vectors e1 , . . . , en is unitary). Remark 7.4.8. We can now finish the proof from Example 6.1.5 that the trace is an inner product on matrices. Suppose 0 6= A ∈ Mn (C). As A∗ A is self-adjoint, there is a unitary matrix U and a diagonal matrix D such that A∗ A = U DU ∗ = U DU −1 . Here, the diagonal entries of D are the eigenvalues λ1 , . . . , λn of A∗ A (possibly with repetition). By Lemma 2.5.2, n X ∗ hA, Ai = tr(A A) = tr(D) = λi . i=1 ∗

We must have that λi 6= 0 for some i. Otherwise, A A(x) = 0 for all x ∈ Cn , which would imply A = 0. Thus, it suffices to show λi ≥ 0 for all i. For 1 ≤ i ≤ n, suppose xi is a (non-zero) eigenvector corresponding to λi . Then λi ||xi ||2 = hA∗ Axi , xi i = ||Axi ||2 . It follows that λi ≥ 0, whence the trace is an inner product on Mn (C). Random Thought 7.4.9. A physicist, a mathematician, and a statistician are sitting on a train. The physicist is playing with a rubber band, the mathematician is eating dinner, and the statistician is reading a newspaper. Who is the conductor? Not the rubber band... because rubber isn’t a conductor!

78

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Lemma 7.4.10. Let T : V → V be a self-adjoint operator. Suppose x, y are eigenvectors with distinct eigenvalues λ, µ, respectively. Then x is orthogonal to y. Proof. By Lemma 7.4.2, λ and µ are both real. Observe λhx, yi = hT x, yi = hx, T yi = hx, µyi = µhx, yi, which implies that hx, yi = 0.

Theorem 7.4.11. Let T : V → V be a self-adjoint operator and λ1 , . . . , λk the distinct eigenvalues of T . Then 1V = projker(T −λ1 1V ) + · · · + projker(T −λk 1V ) and T = λ1 projker(T −λ1 1V ) + · · · + λk projker(T −kλk 1V ) . Here, projker(T −λi 1V ) is the orthogonal projection onto ker(T − λi 1V ) (see Definition 6.4.1). Proof. By Lemma 7.4.10, the subspaces ker(T − λj 1V ) are mutually orthogonal. This implies that if xi ∈ ker(T − λi 1V ), then xi , if i = j, projker(T −λj 1V ) (xi ) = 0, if i 6= j. As we may write x = x1 + · · · + xk ∈ V , xi ∈ ker(T − λi 1V ), this implies 1V = projker(T −λ1 1V ) + · · · + projker(T −λk 1V ) . Moreover, T (x) = λ1 x1 + · · · + λk xk = λ1 projker(T −λ1 1V ) (x) + · · · + λk projker(T −λk 1V ) (x). This proves the theorem.

The final corollary is the only result in this section that solely makes sense for F = C. Corollary 7.4.12. (Spectral theorem for skew-adjoint operators) Let T : V → V be a skew-adjoint operator (T ∗ = −T ). Then there exists an orthonormal basis of eigenvectors e1 , . . . , ek with corresponding eigenvalues ia1 , . . . , iak , aj ∈ R. Proof. This follows from the spectral theorem for self-adjoint operators after noting that iT is self-adjoint. 7.5. Normal operators and the spectral theorem for normal operators. In the previous section, we learned that self-adjoint operators are diagonalizable. More precisely, every self-adjoint operator has an orthonormal basis of eigenvectors. A natural question to ask is whether we can characterize those linear operators which have an orthonormal basis of eigenvectors. And is it even normal for a linear operator to be diagonalizable? Well, I’m not sure about that question, but the linear operators which are diagonalizable are exactly the normal ones. We will typically assume V is a finite-dimensional complex inner product space in this section. Definition 7.5.1. A linear operator T : V → V is called normal if T ∗ T = T T ∗ .

SWILA NOTES

79

Remark 7.5.2. Note that self-adjoint and skew-adjoint operators are normal. Also, orthogonal transformations are normal. We aim to show that an operator is normal if and only if it has an orthonormal basis of eigenvectors. We can easily prove necessity. Proposition 7.5.3. Suppose a linear operator T : V → V has an orthonormal basis of eigenvectors. Then T is normal. Proof. The main ideas for this proof are that diagonal matrices commute and Propostion 7.1.6, which stated that one can pull out adjoints from matrix representations when the basis is orthonormal. Let B := {e1 , . . . , en } be an orthonormal basis of eigenvectors for T . Then [T ]B is a diagonal matrix (with its k th diagonal entry being the eigenvalue corresponding to ek ). Clearly, its adjoint [T ]∗B is also diagonal. As diagonal matrices commute, [T ]B [T ]∗B = [T ]∗B [T ]B . By Proposition 7.1.6, [T ]∗B = [T ∗ ]B , and hence [T T ∗ ]B = [T ]B [T ∗ ]B = [T ∗ ]B [T ]B = [T ∗ T ]B . As T T ∗ agrees with T ∗ T on a basis of V , we may conclude T is normal.

Example 7.5.4. This example is taken from Petersen’s notes (Example 83, page 213). There exist linear operators that are not normal, yet have bases of eigenvectors. For example, the matrix 1 1 A := 0 2 (which corresponds to a linear operator on R2 by left application) is not self-adjoint/symmetric. In fact, one can check that A is not even normal. However, R2 has a basis of eigenvectors {(1, 0), (1, 1)} with corresponding eigenvalues 1, 2. The key is that this basis is not orthogonal. Random Thought 7.5.5. Consider the outside temperature as a function of time. Coming from California, I used to believe the temperature function was continuous, possibly even differentiable. Now in Illinois, I’m not even sure it’s measurable. Proposition 7.5.6. (Characterization of normal operators) Let T : V → V be a linear operator. Then the following are equivalent: (1) T is normal. (2) T T ∗ = T ∗ T . (3) ||T x|| = ||T ∗ x|| for all x ∈ V . (4) BC = CB, where B = 21 (T + T ∗ ) and C = 21 (T − T ∗ ). Proof. (1) ⇔ (2) by definition. (2) ⇔ (4): This follows from the computations 1 BC = (T 2 − T T ∗ + T ∗ T − T ∗ T ∗ ) 4 1 CB = (T 2 − T ∗ T + T T ∗ − T ∗ T ∗ ). 4

80

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Note that we need to be careful about order in this computation as operators don’t commute in general. (2) ⇒ (3): Observe ||T x||2 = hT x, T xi = hT ∗ T x, xi = hT T ∗ x, xi = hT ∗ x, T ∗ xi = ||T ∗ x||2 . (3) ⇒ (2): We have 0 = hT x, T xi − hT ∗ x, T ∗ xi = h(T ∗ T − T T ∗ )x, xi

for all x ∈ V.

Note the operator T ∗ T − T T ∗ is self-adjoint. It follows from Proposition 7.2.1 that T T ∗ = T ∗T . We leave the following as an exercise to the reader (just follow your nose). Lemma 7.5.7. Let T : V → V be a linear operator. If M ⊂ V is a T and T ∗ invariant subspace, then M ⊥ is also T and T ∗ invariant. In particular, (T |M ⊥ )∗ = T ∗ |M ⊥ . Random Thought 7.5.8. Little known fact: The popular children’s song “For he’s a jolly good fellow” actually was originally written about NSF Fellows. We now prove the spectal theorem for normal operators on complex inner product spaces. Theorem 7.5.9. (The spectral theorem for normal operators) Let T : V → V be an operator on a complex inner product space. Then T is normal if and only if T has an orthonormal basis of eigenvectors. Proof. We proved that existence of an orthonormal basis of eigenvectors implies normality in Proposition 7.5.3. Conversely, suppose T is normal. As in the proof of the spectral theorem for self-adjoint operators (Theorem 7.4.4), we aim to show that T has an eigenvector and that the orthogonal complement of this eigenvector is T invariant. Define the self-adjoint operators B = 21 (T + T ∗ ) and C = 2i1 (T − T ∗ ) on V . Observe that T = B + iC. By Theorem 7.4.4, there exists a real eigenvalue λ of B, which implies ker(B − λ1V ) 6= 0. Since B · iC = iC · B (see Proposition 7.5.6), B ◦ C = C ◦ B. If z ∈ ker(B − λ1V ), (B − λ1V )(C(z)) = B ◦ C(z) − λC(x) = C ◦ (B − λ1V )(z) = 0. This shows the subspace ker(B − λ1V ) is C-invariant. Since C is self-adjoint and hence C|ker(B−λ1V ) is as well, we may find 0 6= x ∈ ker(B − λ1V ) and µ ∈ R such that C(x) = µx. This means T (x) = B(x) + iC(x) = (λ + iµ)(x). In addition, T ∗ (x) = B(x) − iC(x) = (λ − iµ)(x). This shows span{x} is T and T ∗ invariant. By Lemma 7.5.7, this implies M := (span{x})⊥ is also T and T ∗ invariant with (T |M )∗ = T ∗ |M . This implies T |M : M → M is also normal, and we can apply the inductive argument on the dimension of V as in Theorem 7.4.4. Recall that unitary operators T : V → V , where V is a complex inner product space, are characterized as satisfying T T ∗ = 1V = T ∗ T . The spectral theorem for normal operators gives us a spectral theorem for unitary operators.

SWILA NOTES

81

Theorem 7.5.10. (Spectral theorem for unitary operators) Let T be a unitary operator on a finite-dimensional complex inner product space V . Then there exists an orthonormal basis {v1 , . . . , vn } of V such that T (v1 ) = eθ1 v1 , . . ., T (vn ) = eiθn vn , where θ1 , . . . , θn ∈ R. Proof. Note that every unitary operator is normal. As ||T (x)|| = ||x|| (see Theorem 7.2.5) for all x ∈ V , it follows that every eigenvalue λ of T satisfies |λ| = 1. The theorem follows from the polar decomposition of complex numbers. We can also extend Theorem 7.4.11 to normal operators with the same proof. Theorem 7.5.11. Let T : V → V be a normal operator and λ1 , . . . , λk the distinct eigenvalues of T . Then 1V = projker(T −λ1 1V ) + · · · + projker(T −λk 1V ) and T = λ1 projker(T −λ1 1V ) + · · · + λk projker(T −λk 1V ) . 7.6. Schur’s theorem. We have shown that self-adjoint operators and normal operators are diagonalizable with orthonormal bases of eigenvectors. One may be wondering what we can say about general linear operators. We will show in this section that linear operators on finitedimensional complex inner product spaces have an upper triangular matrix representation. Throughout this section, V will be a finite-dimensional complex inner product space. 0 1 Example 7.6.1. The matrix A := is a typical example of a matrix that is not 0 0 diagonalizable. However, A is upper triangular. Definition 7.6.2. Let e1 , . . . , en be a basis for V . We define the linear transformation   a1   ( e1 · · · en ) : Fn → V by ( e1 · · · en )  ...  = a1 e1 + · · · + an en . an We will show that every linear operator has an upper triangular matrix representation. Theorem 7.6.3. (Schur’s theorem) Let T an orthonormal basis B = {e1 , . . . , en } of V triangular. Equivalently,  a11  0  T = ( e1 · · · en )  .  .. 0

: V → V be a linear operator. Then there exists such that the matrix representation [T ]B is upper a12 · · · a22 · · · .. .

a1n a2n .. .

···

ann

0

    ( e1 · · · en )∗ . 

Proof. Observe that we need to find an orthonormal basis e1 , . . . , en  a11 a12 · · ·  0 a22 · · ·  ( T (e1 ) · · · T (en ) ) = ( e1 · · · en )  . ..  .. . 0

0

···

of V such that  a1n a2n   ..  . .  ann

82

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

This is equivalent to finding an orthonormal basis e1 , . . . , en of V and constructing an increasing sequence of T -invariant subspaces {0} ⊂ V1 ⊂ V2 ⊂ · · · ⊂ Vn−1 ⊂ V, where Vk = span{e1 , . . . , ek }. This proof will be similar to that of the spectral theorems. Consider the linear operator T ∗ : V → V . Choose an eigenvector x of T ∗ with corresponding eigenvalue λ such that ||x|| = 1. (Existence is guaranteed since F = C.) Define Vn−1 = {x}⊥ = {v ∈ V : hx, vi = 0}. For all v ∈ Vn−1 , ¯ xi = 0. hT (v), xi = hv, T ∗ xi = λhv, This proves that Vn−1 is T -invariant and has dimension dim(V ) − 1. By induction, T |Vn−1 is upper triangulizable. If we set n = dim(V ), there is an orthonormal basis B = {e1 , . . . , en−1 } of Vn−1 such that [T |Vn−1 ]B is upper triangular. If we set B˜ = B ∪ {x}, it is easy to see [T ]B˜ is upper triangular. 7.7. Singular value decomposition. In this section, F = R or C. Given an orthonormal basis {e1 , . . . , en } of a vector space V , recall (see Definition 7.6.2) that we define   a1   ( e1 · · · en ) : Fn → V by ( e1 · · · en )  ...  = a1 e1 + · · · + an en . an Note the adjoint ( e1 · · · en )∗ : V → Fn of this transformation is defined by   a1   ( e1 · · · en )∗ (a1 e1 + · · · + an en ) =  ...  . an Theorem 7.7.1. (The Singular Value Decomposition) Let T : V → W be a linear map between finite-dimensional inner poduct spaces. Then there is an orthonormal basis {e1 , . . . , em } of V such that hT (ei ), T (ej )) = 0 for i 6= j. Moreover, we can find orthonormal bases B = {e1 , . . . , em } of V and C = {f1 , . . . , fn } of W and nonnegative real numbers σ1 , . . . , σk , k ≤ m, so that T (e1 ) = σ1 f1 , . . . , T (ek ) = σk fk T (ek+1 ) = · · · = T (em ) = 0. In other words, T = ( f1 · · · fn )[T ]C,B (  σ1   0   .. = ( f1 · · · fn )   .  ..  .  .. .

e1 · · · em )∗ 0 .. .

0

···

0

0

0

σk .. .

0 0 0

  ···    ∗ ···   ( e1 · · · em ) .  ···   .. .

SWILA NOTES

83

Proof. We can apply the spectral theorem to the self-adjoint operator T ∗ T : V → V to find an orthonormal basis {e1 , . . . , em } of V such that T ∗ T (ei ) = λi ei , some λi ∈ F. Then λi , i=j ∗ hT (ei ), T (ej )i = hT T (ei ), ej i = hλi ei , ej i = 0, i 6= j. Reordering if necessary, we may assume λ1 , . . . , λk > 0 and λl = 0 for l > k. Define fi =

T (ei ) , i = 1, . . . , k. ||T (ei )||

Then extend {f1 , . . . , fk } to an orthonormal basis {f1 , . . . , f√ k , fk+1 , . . . , fn } for W (possibly using the Gram-Schmidt Process). Setting σi = ||T (ei )|| = λi , T (ei ) = σi fi for all i. The theorem follows. Remark 7.7.2. Some sources may use T T ∗ in the proof. Their result will be a decomposition of T ∗ as opposed to one of T as we have here. This immediately gives us the singular value decomposition of matrices. We call a (possibly non-square) matrix D = (dij ) diagonal if dij = 0 whenever i 6= j. Corollary 7.7.3. (The Singular Value Decomposition for matrices) Let A be a real (complex) m × n matrix. Then there is an orthogonal (unitary) m × m matrix U , an orthogonal (unitary) n × n matrix V , and a diagonal m × n matrix D with nonnegative entries such that A = U DV ∗ . Equivalently, there exists an orthonormal basis {e1 , . . . , em } of Fm , an orthonormal basis {f1 , . . . , fn } of Fn , and nonnegative real numbers σ1 , . . . , σk , k ≤ m, such that A(e1 ) = σ1 f1 , . . . , A(ek ) = σk fk A(ek+1 ) = · · · = A(em ) = 0. Random Thought 7.7.4. Did you know that people sell plastic nets to hit tennis balls with? I just think it’s a racket. Example 7.7.5. Let  −1 0 A =  1 −1  : R2 → R3 . 0 1 

We will find the singular value decomposition of A by following the proof of Theorem 7.7.1. We can calculate   −1 0 −1 1 0  2 −1 t 1 −1  = AA= . 0 −1 1 −1 2 0 1 We have t

det(A A − tId) = det

2 − t −1 −1 2 − t

This shows the eigenvalues of At A are 3 and 1.

= (2 − t)2 − 1 = (t − 3)(t − 1).

84

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

√ √ An eigenvalue of At A associated with the eigenvalue 3 √ is e1 =√(1/ 2, −1/ 2). An eigenvalue of At A associated with the eigenvalue 1 is e2 = (1/ 2, 1/ 2). We have √   −1/ 2 A(e1 ) 1  √ f1 = =√ 2√  ||A(e1 )|| 3 −1/ 2 and

√  −1/ 2 A(e2 ) 0√  . f2 = = ||A(e2 )|| 1/ 2 

Letting 

 1 1 f3 := √  1  , 3 1 3 {f1 , f2 , f3 } is an orthonormal basis for R . We may conclude that √ √  √ √   √ √ −1/(3 2) −1/ 2 1/ 3 3 0 √ √ 1/√2 −1/√ 2     A= . 0 1 2/3 0√ 1/√3 √ 1/ 2 1/ 2 0 0 −1/(3 2) 1/ 2 1/ 3

SWILA NOTES

85

8. A bird? A plane?... No, it’s a chapter about multilinear algebra. The material on dual spaces is adapted from Petersen’s notes [6]. The material on tensor products is adapted from Lang’s Linear Algebra [3]. 8.1. Dual spaces. (Sketch of intro) Remind reader of the cartoon series Yu-Gi-Oh!... give brief synopsis of an episode (say it’s my favorite episode ever)... tell obscure fact from episode to convince reader I’ve actually seen it and love it... offer tenuous analogy about how YuGi-Oh! relates to dual spaces (half-heartedly try to relate the heart of the cards with the heart of functional analysis)... end with climactic pun “It’s Time to D-D-D-D Dual!” Let’s just get on with the first section of this final chapter: dual spaces. Definition 8.1.1. Fix a vector space V over a field F. We define the dual vector space (or dual space) to be V ∗ = L(V, F), the vector space of linear functions on V . We often call an element of V ∗ a linear functional. Remark 8.1.2. We are no longer assuming our spaces are inner product spaces, so we may not have a metric on our space. Thus, we may make no assumptions on the continuity of our linear functionals in this section. A simple argument shows the following: Proposition 8.1.3. Fix a finite-dimensional vector space V . Let {v1 , . . . , vn } be a basis for P V . For each i, define vi∗ : V → F by nj=1 aj vj 7→ ai . Then {v1∗ , . . . , vn∗ } forms a basis for V ∗ . In particular, V ∗ is finite-dimensional with dim(V ∗ ) = dim(V ). Remark 8.1.4. In the above proposition, we call {v1∗ , . . . , vn∗ } the dual basis of {v1 , . . . , vn }. It’s not hard to see that each vi∗ is a linear functional. Example 8.1.5. Recall that we define the Kronecker delta function δ : Z × Z → {0, 1}, where 1, if i = j, δij := 0, if i 6= j. The dual basis of {e1 , . . . , en } in Rn is {e∗1 , . . . , e∗n }, where each e∗i is defined by setting e∗i (ej ) := δij and extending linearly. Example 8.1.6. Fix an interval [a, b] ⊂ R and a continuous function f : [a, b] → R. If L1 [a, b] denotes the vector space of integrable functions on [a, b], Z b 1 λf : L [a, b] → R, λf (g) := f (x)g(x) dx a 1

∗

is a linear functional in (L [a, b]) . Example 8.1.7. The trace function tr : Mn (F) → F by tr(aij ) =

n X i=1

is a linear functional.

aii

86

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Definition 8.1.8. Let T : V → W be a linear transformation. We define the dual map T ∗ : W ∗ → V ∗ by (T ∗ f )(v) = f (T v). Excuse the abuse of notation as we also used this notation for adjoints of linear transformations. We will not be discussing adjoints in this chapter. One can show various nice properties of dual maps. Proposition 8.1.9. Let T, T˜ : V → W and S : W → U be linear transformations. Then (1) (aT + bT˜)∗ = aT ∗ + bT˜∗ (2) (S ◦ T )∗ = T ∗ ◦ S ∗ . Definition 8.1.10. Given a vector space V , we define the double dual V ∗∗ := (V ∗ )∗ . In other words, the double dual of a vector space is the dual of the dual vector space. Remark 8.1.11. We may view V ⊆ V ∗∗ via an injective linear transformation ι : V → V ∗∗ . For v ∈ V , we define ι(v) ∈ V ∗∗ by ι(v)(f ) = f (v). If every element of the double dual V ∗∗ is of the form ι(v), v ∈ V , we say V is reflexive. Proposition 8.1.12. Every finite-dimensional vector space is reflexive, i.e., the map ι : V → V ∗∗ is a vector space isomorphism. Proof. It isn’t hard to see that ι is a linear transformation. To see that ι is injective, suppose ι(v) = 0 for some v ∈ F. If v 6= 0, we may complete {v} to a basis {v, v1 , . . . , vn−1 } for V . Then, f : V → F defined by f (av + a1 v1 + · · · + an vn ) = a is an element of the dual V ∗ . As ι(v)(f ) = f (v) = 1 6= 0, we have a contradiction. As dim(V ) = dim(V ∗ ) = dim(V ∗∗ ), it follows from the Rank-Nullity Theorem that ι is a vector space isomorphism (see Corollary 3.3.4). Remark 8.1.13. Let V be a (possibly infinite-dimensional) vector space. In analysis, we will typically also assume V is a normed space. Then we define the dual of V to be the vector space of all continuous linear functionals on V . As remarked by Professor Jeremy Tyson, V is reflexive if and only if the map ι is an isomorphism. It isn’t enough to say that there is some other isomorphism V → V ∗∗ . Fortunately, all finite-dimensional normed spaces are reflexive. We will explore continuous linear functionals further in the problem set. 8.2. Tensor products. Are you feeling a bit tense? Is there a big knot in your stomach? Is it because you have a blinear map defined on a product of two vector spaces, but you really want a linear map defined on some interwined version of the two vector spaces? Well, worry no more, for we pulled some strings and in this section, we will introduce the tensor product of vector spaces! Given vector spaces V, W over a common field F, we can naturally define the vector space V × W via the operations a(v1 , w1 ) := (av1 , aw2 ) (v1 , w1 ) + (v2 , w2 ) := (v1 + v2 , w1 + w2 )

for v1 , v2 ∈ V, w1 , w2 ∈ W, a ∈ F.

We wish to define a different type of vector space, the tensor product, that has a bilinear operation. In this section, all vector spaces will be finite-dimensional.

SWILA NOTES

87

Definition 8.2.1. Let U, V, and W be vector spaces over F. We define a bilinear map to be a map g : V × W → U that is linear in each component. More precisely, g(v, ·) : W → U , w 7→ g(v, w) is linear for each fixed v ∈ V , and g(·, w) : V → U , v 7→ g(v, w) is linear for each fixed w ∈ W . Example 8.2.2. Define d : R2 × R2 → R by d((a, b), (c, d)) = det

a b c d

.

By the properties of the determinant, d is a bilinear map. However, d is skew-symmetric (d(x, y) = −d(y, x) for all x, y ∈ R2 ) and d(x, x) = 0 for all x ∈ R2 . This implies that d is not an inner product on R2 . Theorem 8.2.3. Let V, W be finite-dimensional vector spaces over a field F. There exists a finite-dimensional vector space denoted by V ⊗W over F, and a bilinear map V ×W → V ⊗W denoted by (v, w) 7→ v ⊗ w. 0 More specifically, for all v, v ∈ V , w, w0 ∈ W , a ∈ F, (v + v 0 ) ⊗ w = v ⊗ w + v 0 ⊗ w (3)

v ⊗ (w + w0 ) = v ⊗ w + v ⊗ w0 (av) ⊗ w = a(v ⊗ w) = v ⊗ (aw).

Moreover, V ⊗ W and the map ⊗ satisfy the following properties: (1) (The Universal Property of Tensor Products) If U is a vector space over F, and g : V × W → U is a bilinear map, then there exists a unique linear map g∗ : V ⊗ W → U such that, for all v ∈ V and w ∈ W , we have g(v, w) = g∗ (v ⊗ w). (2) If {v1 , . . . , vn } is a basis of V , and {w1 , . . . , wm } is a basis of W , then {vi ⊗ wj : i = 1, . . . , n, j = 1, . . . , m} forms a basis of V ⊗ W . In particular, dim(V ⊗ W ) = dim(V ) · dim(W ). Remark 8.2.4. We commonly refer to elements of the form v ⊗ w ∈ V ⊗ W as simple tensors. It is a fact that in general, not every element of V ⊗ W can be written as a simple tensor. Proof. Fix a basis {v1 , . . . , vn } for V and a basis {w1 , . . . , wm } for W . Assign a letter tij to each pair (i, j), 1 ≤ i ≤ n, 1 ≤ j ≤ m. (This letter is not some element of V or W , but really is just an independent variable.) We define the vector space V ⊗ W to be the vector space over F of all formal F-linear combinations of the elements tij . In other words, each element x of V ⊗ W can be uniquely written in the form n X m X x= cij tij , cij ∈ F. i=1 j=1

The letters tij form a basis for V ⊗ W . (For those of you familiar with simplicial homology, note the similarity of defining k-chains of simplicial complexes.)

88

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

If v = a1 v1 + · · · + an vn ∈ V and w = b1 w1 + · · · + bm wm ∈ W , we define v ⊗ w to be the element n X m X v⊗w = ai bj tij ∈ V ⊗ W. i=1 j=1

In particular, vi ⊗ wj = tij , which shows (2) holds. It is easy to see that the properties at (3) hold. For example, suppose v = a1 v1 + · · · + an vn , v 0 = a01 v1 + · · · + a0n vn ∈ V and w = b1 w1 + · · · + bm wm ∈ W . Then v + v 0 = (a1 + a01 )v1 + · · · + (an + a0n )vn , which implies (v + v 0 ) ⊗ w =

n X m X (ai + a0i )bj tij = v ⊗ w + v 0 ⊗ w. i=1 j=1

The other properties follow similarly, so I’ll leave them as an exercise to the reader. This shows that ⊗ : V × W → V ⊗ W is a bilinear map. It remains to prove property (1). Let g : V × W → U be a bilinear map. For each pair (i, j), define g∗ (vi ⊗ wj ) = g(vi , wj ). We may then uniquely extend g∗ to a linear transformation V ⊗ W → U since the terms vi ⊗ wj form a basis for V ⊗ W . We conclude the construction by calculating ! n m n X m X X X g(v, w) = g ai v i , bj w j = ai bj g(vi , wj ) = g∗ (v ⊗ w). i=1

j=1

i=1 j=1

Remark 8.2.5. Given a commutative ring R, one can define the tensor product of Rmodules. This can be much more difficult since R-modules don’t necessarily have bases. This implies that one cannot canonically or uniquely write an element of the tensor product as a sum of simple tensors. This makes it much more difficult to define linear maps on the tensor product, so one must take advantage of the universal property of tensor products. This is property (1) above in the theorem. Random Thought 8.2.6. A man is holding pickles in his hands and wants to store them somewhere. He spies an empty room and hopes that there are containers inside. He enters the room, but doesn’t see any. Then he turns around and notices that the door is ajar. Theorem 8.2.7. Let U, V, W be finite-dimensional vector spaces. Then there is a unique isomorphism U ⊗ (V ⊗ W ) → (U ⊗ V ) ⊗ W such that u ⊗ (v ⊗ w) 7→ (u ⊗ v) ⊗ w for all u ∈ U , v ∈ V , and w ∈ W . Proof. Fix bases {ui }, {vj }, {wk } of U, V, W , respectively. From Theorem 8.2.3, the elements ui ⊗ (vj ⊗ wk ) form a basis for U ⊗ (V ⊗ W ), and the elements (ui ⊗ vj ) ⊗ wk form a basis for (U ⊗ V ) ⊗ W . Let T : U ⊗ (V ⊗ W ) → (U ⊗ V ) ⊗ W be the linear transformation induced by T (ui ⊗ (vj ⊗ wk )) = (ui ⊗ vj ) ⊗ wk .

SWILA NOTES

89

By expanding, it is easy to see that T satisfies T (u ⊗ (v ⊗ w)) = (u ⊗ v) ⊗ w for all u ∈ U , v ∈ V , and w ∈ W .

Thus, given vector spaces V1 , . . . , Vn , we may unambiguously work with the vector space V1 ⊗ · · · ⊗ Vn and elements v1 ⊗ · · · ⊗ vn . An important properties of tensor products is that they offer us a different perspective of linear operators. Recall for a vector spaces V , L(V, V ) is defined to be the vector space of linear operators on V . Theorem 8.2.8. Let V be a finite-dimensional vector space over F. Let V ∗ be the dual space of V and L(V, V ) the space of linear operators on V . Then there exists a unique isomorphism V ∗ ⊗ V → L(V, V ) which to each element ϕ ⊗ v (with ϕ ∈ V ∗ and v ∈ V ) associates the map Tϕ⊗v : V → V defined by Tϕ⊗v (w) = ϕ(w)v. Proof. Let {v1 , . . . , vn } be a basis for V . For each i = 1, . . . , n, define ϕi : V → F by ϕi (vj ) = δij . Note that {ϕ1 , . . . , ϕn } forms a basis for V ∗ . For all 1 ≤ i, j ≤ n, define Tϕi ⊗vj : V → V

by Tϕi ⊗vj (w) = ϕi (w)vj .

By Theorem 8.2.3, {ϕi ⊗ vj } forms a basis for V ∗ ⊗ V . Let F : V ∗ ⊗ V → L(V, V ) be the linear transformation induced by defining F (ϕi ⊗ vj ) := Tϕi ⊗vj . By Theorem 8.2.3, dim(V ∗ ⊗ V ) = n2 = dim(L(V, V )). Thus, by Corollary 3.3.4, it suffices to show F is surjective. Fix a linear operator T : V → V . For each i = 1, . . . , n, we may write T (vi ) = a1,i v1 + · · · + an,i vn . This implies T (vi ) = a1,i Tϕi ⊗v1 (vi ) + · · · + an,i Tϕi ⊗vn (vi ). As Tϕj ⊗vk (vi ) = 0 whenever i 6= j, it follows that T (vi ) =

n X n X

ak,j Tϕj ⊗vk (vi )

j=1 k=1

for all i. We may conclude n X n X T = ak,j Tϕj ⊗vk = F j=1 k=1

n n X X

! akj ϕj ⊗ vk

.

j=1 k=1

as they agree P on a basis. This proves Psurjectivity, and hence, that F is an isomorphism. Fix ϕ = ni=1 ai ϕi ∈ V ∗ and v = nj=1 bj vj ∈ V . For any w ∈ V , Tϕ⊗v (w) =

n X n X i=1 j=1

ai bj ϕi (w)vj =

n X i=1

ai ϕi (w)v = ϕ(w)v.

90

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

Finally, we prove uniqueness. Suppose there is another isomorphism F˜ : V ∗ ⊗V → L(V, V ) satisfying F˜ (ϕ ⊗ v) = Tϕ⊗v for all v ∈ V and ϕ ∈ V ∗ . In particular, F˜ agrees with F on the basis {ϕi ⊗ vj } of V ∗ ⊗ V . Uniqueness follows. 8.3. Quadratic forms. Have you been working out as much as you would like to? Or have you been going to Antonio’s and Cocomero a bit too often? If you sail in the latter boat, have we got the product for you! You may have heard about the recent craze of “muscle confusion”. Well, our one doctor Dr. Milton N. Youralgebra has discovered that this actually works against you. You may have heard that your body contains slow-twitch and fast-twitch muscles. Through a highly technical process involving the Scientific Method, Dr. Youralgebra has found that this conflicting existence actually works against you in that it inhibits the development of your lower body. Our innovative new product transforms these slow-twitch and fast-twitch muscles into medium speed-twitch muscles, which results in phenomenal growth of your quadriceps. So, this summer, y’all add some quads... to your bods with The Quadratic Form! (patent denied) Random Thought 8.3.1. How you explain to your significant other that you don’t need Calc III because you already have him or her: “I don’t need Q dy... because I already have you, Q dπ.” Definition 8.3.2. A quadratic form Q in n real variables x = (x1 , . . . , xn ) is a function of the form X Q(x) = aij xi xj , aij ∈ R. 1≤i≤j≤n

Each term xi xj only shows up once in this sum. To incorporate matrices, we can write Q as Q(x) =

n X

a0ij xi xj ,

i,j=1

where

a0ii

= aii and

a0ij

= aij /2 for i 6= j. This implies Q(x) = hAx, xi,

Recall the inner product defined    x1    h ...  ,  xn

where Aij = a0ij .

in Rn is given by   y1 n .. i = X x y =   i i .  yn

i=1

 t  y1 x1 ..   ..  . .   .  yn xn

Observe that the matrix A induced by Q is symmetric, and hence self-adjoint. By the Spectral Theorem, there is an orthonormal basis for eigenvectors for A. If O is the matrix with this basis as its columns, A = ODOt  λ1 0 · · · 0  0 λ2 0  O . ..  .. . 0 0 0 λn Since Q is symmetric, each of the λi is real.

   t O . 

SWILA NOTES

We can define new coordinates on Rn by    y1  ..  −1   . =O  yn

91

 x1 ..  , .  xn

or x = Oy. We then have Q(x) = hAx, xi = hAOy, Oyi = hOt AOy, yi = hDy, yi = Q0 (y). Thus, Q0 (y) = λ1 y12 + · · · + λn yn2 . Definition 8.3.3. (Classification of quadratic forms) Let Q be a quadratic form in n real variables. (1) If λ1 , . . . , λn are either all positive or all negative, then Q is said to be elliptic. (2) If λ1 , . . . , λn are all nonzero and consist of both positive and negative terms, Q is said to be hyperbolic. (3) If at least one of the λi is zero, then Q is said to be parabolic. If λ1 · · · λn 6= 0, i.e., we are in case (1) or (2), we say that Q is non-degenerate. If n = 2, then x2 + y 2 = 1 is a formula of an ellipse, x2 − y 2 = 1 is a formula of a hyperbola, and y − x2 = 2 is a formula of a parabola. We leave the proof of the following lemma to readers (it’s not too bad). Lemma 8.3.4. (Descartes’ Rule of Signs) Let p(t) = tn + an−1 tn−1 + · · · + a2 t + a0 = (t − λ1 ) · · · (t − λn ), where a0 , . . . , an−1 , λ1 , . . . , λn ∈ R. Then the following hold: (1) 0 is a root of p(t) if and only if a0 = 0. (2) All roots of p(t) are negative if and only if an−1 , . . . , a0 > 0. (3) If n is odd, then all roots of p(t) are positive if and only if an−1 < 0, an−2 > 0, . . . , a1 > 0, a0 < 0. (4) If n is even, all roots of p(t) are positive if and only if an−1 < 0, an−2 > 0, . . ., a1 < 0, a0 > 0. We will now apply quadratic forms to multivariable calculus. Consider a function f (x) = df a + Q(x), where Q(x) is a quadratic form. Note f (0) = a and dx (0) = 0 for i = 1, . . . , n. i Thus the origin is a critical point for f . The type of the quadratic form Q will tell us whether 0 is a minimum, maximum, or neither. Assume Q is nondegenerate, i.e., none of the λi associated with Q are zero. If 0 > λ1 ≥ · · · ≥ λn , f (x) ≤ a + λ1 ||x||2 ≤ a and 0 is a maximum of f . Similarly, if 0 < λ1 ≤ · · · ≤ λn , 0 is a minimum of f . This shows that if Q is elliptic, then f has a minimum or maximum at f . On the other hand, one can see

92

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

that Q will decrease in the directions associated with λi < 0 and increase in the directions associated with λi > 0. Thus, if Q is hyperbolic, f will have neither a maximum nor a minimum at 0. We will not consider the case when Q is parabolic. We will now generalize this analysis. df Let f : Rn → R be a smooth function with a critical point at p ∈ Rn , i.e., dx (p) = 0. The i Taylor expansion up to order 2 tells us n X ∂ 2f (p) · hi hj + o(||h||2 ), f (p + h) = f (p) + ∂x ∂x i j i,j=1 where o(||h||2 ) : Rn → R is a function satisfying o(||h||2 ) = 0. h→0 ||h||2 lim

2 f , the second derivative term looks like a quadratic form in h. A is Defining A = ∂x∂i ∂x j the Hessian matrix of f and is symmetric by Clauraut’s theorem. The following theorem generalizes the above case f (x) = a + Q(x). Theorem 8.3.5. Let f : Rn → R be a smooth functionthat has a critical point at p with ∂2f σ1 ≥ · · · ≥ σn the eigenvalues for the symmetric matrix ∂xi ∂xj . (1) (2) (3) (4)

If λn > 0, then p is a local minimum for f . If λ1 < 0, then p is a local maximum for f . If λ1 > 0 and λn < 0, then f has a saddle point at p. Otherwise, there is no conclusion about f at p.

Proof. We will only prove case 1. Choose a neighborhood U around 0 where o(||h||2 ) for all h ∈ U. ||h||2 < λn In this neighborhood, f (p + h) = f (p) +

n X

∂ 2f (p)hi hj + o(||h||2 ) ∂x ∂x i j i,j=1

o(||h||2 ) ≥ f (p) + λn ||h||2 + ||h||2 ||h||2 o(||h||2 ) = f (p) + λn + ||h||2 ||h||2 > f (p). For case 3, select unit eigenvectors v1 and vn corresponding to λ1 and λn , respectively. Then f (p + tvi ) = f (p) + t2 σi + o(t2 ) for i = 1 and i = n. Letting t → 0, similar analysis as above implies that f (p + tv1 ) > 0 and f (p + tvn ) < 0 for sufficiently small t > 0.

SWILA NOTES

93

Remark 8.3.6. To see case (4) gives no general conclusion about the behavior of f near p, consider the following three examples with p = 0: f : R → R, f (x) = x; g : R → R, g(x) = x4 ; and h : R → R, h(x) = −x4 .

94

BY DEREK JUNG, ADAPTED FROM NOTES BY UCLA PROF. PETER PETERSEN

References [1] Stephen H. Friedberg, Lawrence E. Spence, and Arnold J. Insel. Linear algebra. 3rd ed. Prentice Hall, Upper Saddle River, NJ, 1997. [2] Jack Huizenga. “Linear Algebra: Why is the dimension of a vector space well-defined?” https://www.quora.com/Linear-AlgebraWhy-is-the-dimension-of-a-vector-space-well-defined. [3] Serge Lang. Linear Algebra. 2nd ed. Addison-Wesley, Reading, MA, 1971. [4] MathDoctorBob. “Overview of Jordan Canonical Form.” Online video clip. Youtube. September 12, 2011. Web. March 25, 2016. [5] LastWeekTonight. “Last Week Tonight with John Oliver: the IRS.” Online video clip. Youtube. HBO, April 12, 2015. Web. March 24, 2016. [6] Peter Petersen. Linear algebra. Los Angeles, CA, 2000. http://www.calpoly.edu/~ jborzell/Courses/Year%2010-11/Fall%202010/ Petersen-Linear Algebra-Math 306.pdf. [7] Terence Tao. An Epsilon of Room, I: Real Analysis. American Mathematical Society, Providence, RI, 2010. [8] Fuzhen Zhang. Linear algebra: challenging problems for students. 2nd ed. Johns Hopkins University Press, Baltimore, MD, 2009.

Multilinear algebra; 8.1. Dual spaces; 8.2. Tensor products. Page 3 of 94. SWILA Notes.pdf. SWILA Notes.pdf. Open. Extract. Open with. Sign In. Main menu.

Download PDF

601KB Sizes 9 Downloads 142 Views

Report

SWILA Notes.pdf

Recommend Documents