Linear Algebra (Mathemeticians)#

Contents

Linear Algebra (Mathemeticians)

Introduction #

These notes are written from the perspective of a mathematician, which might be helpful.

Sets #

A set is an unordered collection of mathematical objects we call its elements.

\[\begin{gather*} A = \{1,2,3,4\} \end{gather*}\]

Above we have defined a set called \(A\) that contains the numbers 1,2,3,4. Because the order does not matter, if \(B = \{4,3,2,1\}\) then we say \(A = B\) as they contain the same elements. If \(A\) contained an element that \(B\) did not or vice-versa, they would not be equal as sets and we would write \(A \neq B\).

If we would like to indicate that an element \(x\) is in a set \(A\) then we write for short hand \(x \in A\). If we would like to indicate \(x\) is not in \(A\) then we write \(x \not \in A\).

There are a few common sets that are used so frequently they are given special symbols:

Symbol	Description
\(\emptyset\)	The set with no elements
\(\mathbb{Z}\)	The set of all whole numbers
\(\mathbb{R}\)	The set of all real numbers
\(\mathbb{C}\)	The set of all complex numbers

Often it is impractical or even impossible to explicitly list the elements of a set we wish to express, in which case we use set-builder notation:

\[\begin{gather*} A = \{ x | P(x) \} \end{gather*}\]

Where we specify the form of the elements \(x\) and give a true or false condition \(P(x)\) that defines which \(x\) are admissible. If we have multiple such rules we separate them with commas. For example, if we want the set of all integers between 0 and 5 we may write:

\[\begin{gather*} A = \{ x | 0 < x < 5, x \in \mathbb{Z} \} \end{gather*}\]

Which would again yield the set \(A = \{ 1, 2, 3, 4 \}\). The vertical bar in set builder notation is read as “such that” and indicates that the conditions to be in the set follows. A colon may be used in its place and means the same thing.

Vector Spaces #

A vector is often thought of simply as a column of numbers. They are often indicated by a variable wearing an arrow for a hat.

\[\begin{gather*} \vec{v} = \begin{bmatrix} 1\\ -2\\ 3 \end{bmatrix} \end{gather*}\]

In this course we will think of that column of numbers as the coefficients of basis vectors used to construct \(\vec{v}\). Recall from undergraduate linear algebra that a basis for a vector space is a set of vectors that are linearly independent and span the space. For example, in \(\mathbb{R}^3\) the standard basis is:

\[\begin{gather*} \hat{i} = \begin{bmatrix} 1\\ 0\\ 0 \end{bmatrix}, \hat{j} = \begin{bmatrix} 0\\ 1\\ 0 \end{bmatrix}, \hat{k} = \begin{bmatrix} 0\\ 0\\ 1 \end{bmatrix} \end{gather*}\]

These are our building blocks we use to construct the vectors we’re interested in. For example, we see that \(\vec{v}\) from earlier can be written as:

\[\begin{gather*} \vec{v} = 1 \hat{i} -2 \hat{j} + 3 \hat{k} \end{gather*}\]

Bases are not unique; there are generally many possible bases we could choose to represent the same space. It makes sense then for us then to choose whatever basis makes our calculations easier.

Every basis for a vector space has the same number of elements. We call this number the dimension of our space and we note that in this course we will only be working with finite dimensional spaces.

Because we would like to utilize the useful Dirac “bra-ket” notation (which you will see more of later on), we’re going to indicate our vectors with “kets” rather than with arrow hats:

\[\begin{gather*} \vec{v} = \ket{v} \end{gather*}\]

We will also be working with covectors, which for our purposes can be thought of as row vectors. We indicate these with “bras”

\[\begin{gather*} \bra{g} = \begin{bmatrix} 1 & 0 & 2 \end{bmatrix} \end{gather*}\]

In this course we will be working with complex vector spaces, which means vectors will be allowed to have complex numbers as components. For example we may have:

\[\begin{gather*} \ket{h} = \begin{bmatrix} i\\ 3-4i \end{bmatrix} \end{gather*}\]

Because we are working with complex numbers we will often need to take their conjugates. Recall that the conjugate of a complex number \(a + bi\) is simply \(a - bi\), obtained by simply flipping the sign of the imaginary term.

To take the conjugate of a vector we simply take the conjugate of every entry:

\[\begin{gather*} \ket{h}^* = \begin{bmatrix} -i\\ 3+4i \end{bmatrix} \end{gather*}\]

Note that we’ve written a star next to our vector to indicate we’ve taken its conjugate. Clearly if we take the conjugate twice we get back to where we started.

Recall from linear algebra the transpose which turns rows into columns and columns into rows so that:

\[\begin{gather*} \begin{bmatrix} 1\\ 2\\ 3 \end{bmatrix}^T = \begin{bmatrix} 1 & 2 & 3 \end{bmatrix} \end{gather*}\]

For us the act of taking the conjugate of the transpose of a vector will be so common that we give it a name: conjugate transpose (also commonly called the Hermitian conjugate), indicated with a dagger \(\dagger\).

\[\begin{gather*} \begin{bmatrix} i\\ 1+2i\\ 3 \end{bmatrix}^\dagger = \begin{bmatrix} -i & 1-2i & 3 \end{bmatrix} \end{gather*}\]

For us to turn a vector \(\ket{\psi}\) into a covector \(\bra{\psi}\) we need to take its complex conjugate:

\[\begin{gather*} \bra{\psi} = \ket{\psi}^\dagger \end{gather*}\]

Linear Operators #

A function \(f\) is said to be linear if constant factors can be factored out and the function splits over addition:

\[\begin{gather*} f(cx) = cf(x)\\ f(x+y) = f(x) + f(y) \end{gather*}\]

A linear operator \(A\) is simply a linear function whose domain and range are vector spaces. That is, linear operators take in vectors as inputs and produce vectors as outputs. For example:

\[\begin{gather*} A(\ket{v}) = 2\ket{v} \end{gather*}\]

\(A\) in this case simply scales whatever vector \(\ket{v}\) we give it by two. Often we find it burdensome to write the () that come with function notation, and so we instead write \(A\ket{v}\) to mean the same thing. If we have multiple linear operators \(A\) and \(B\) we can compose them and simply write \(BA\ket{\psi}\) to apply \(A\) first and then \(B\). Note that this only makes sense if \(A\) sends \(\ket{\psi}\) to the domain of \(B\).

It is an important result from linear algebra that linear operators on finite-dimensional vector spaces can be represented by matrix multiplication and vice-versa. This is easy to see by the following proof:

Let \(A\) be a linear operator from one vector space to another. Let \(\{v_1, v_2, ..., v_n\}\) be a basis for the domain of \(A\) and \(\{w_1, w_2, ..., w_m\}\) be a basis for the range.

Take any basis vector (call it \(v\)) for the domain and act on it with \(A\) to obtain \(Av\), clearly an element of the range. We can then represent this as a linear combination of our basis vectors for the range:

\[\begin{gather*} A\ket{v} = \sum_{i=1}^m A_{i}\ket{w_i} \end{gather*}\]

If \(\ket{v_j}\) represents the \(j\)th basis vector for \(1 \leq j \leq n\) and \(A_{j}\) represents its coefficients then we see:

\[\begin{gather*} A\ket{v_j} = \sum_{i=1}^m A_{ij}\ket{w_i} \end{gather*}\]

Take any element \(\psi\) from the domain. Represent it by the domain basis vectors:

\[\begin{gather*} \ket{\psi} = b_1 v_1 + b_2 v_2 + \cdots + v_n b_n = \begin{bmatrix} b_1\\ b_2\\ \vdots\\ b_n\end{bmatrix} \end{gather*}\]

And multiply it on the left by the matrix whose entry in the \(i\)th row and \(j\)th column is \(A_{ij}\):

\[\begin{gather*} \begin{bmatrix} A_{11} & A_{12} & \cdots & A_{1n}\\ A_{21} & A_{22} & \cdots & A_{2n}\\ \vdots & \vdots & \cdots & \vdots\\ A_{m1} & A_{m2} & \cdots & A_{mn}\\ \end{bmatrix}\begin{bmatrix} b_1\\ b_2\\ \vdots\\ b_n\end{bmatrix} \end{gather*}\]

The result is a linear combination of the columns of \(A\) with coefficients \(b_1, b_2, ..., b_n\). But the columns of \(A\) are by design the transformed basis vectors of the domain, \(A \ket{v_j}\) with \(1 \leq j \leq n\). Thus the product of our two matrices is:

\[\begin{gather*} b_1A\ket{v_1} + b_2A\ket{v_2} + \cdots + b_n A\ket{v_n}\\ = A(b_1\ket{v_1} + b_2\ket{v_2} + \cdots + b_n\ket{v_n})\\ =A\ket{\psi} \end{gather*}\]

Thus we can obtain the transformed \(\ket{\psi}\) by simple matrix multiplication. Because we were able to do this for any element \(\psi\) in the domain of any linear operator \(A\) we can equivalently represent any linear operator as a matrix.

Inner Products #

Recall from undergraduate linear algebra the dot product between two vectors \(\ket{v}\cdot\ket{w}\) which produces a number. If their dot product is zero the two vectors are orthogonal.

We will work with the generalized notion of the dot product called the inner product - it will take a bra and a ket and assign to it a complex number. You can think of this as the dot product of a row vector with a column vector. To indicate this operation we will write the inner product of \(\bra{\psi}\) and \(\ket{\phi}\) as:

\[\begin{gather*} \braket{\psi|\phi} \end{gather*}\]

Inner products allow us to define two important concepts. Two vectors \(\bra{\psi}\) and \(\ket{\phi}\) are orthogonal if:

\[\begin{gather*} \braket{\psi|\phi} = 0 \end{gather*}\]

The norm of a vector \(\ket{v}\) indicated by \(\|\ket{v}\|\) is given by:

\[\begin{gather*} \|\ket{v}\| = \sqrt{\braket{v|v}} \end{gather*}\]

Where \(v\) is called a unit vector (or sometimes normalized) if \(\|\ket{v}\| = 1\). We can always normalize a vector by dividing it by its norm.

A collection \(\{\ket{v_1}, \ket{v_2}, ..., \ket{v_n} \}\) of vectors is orthonormal if:

\[\begin{gather*} \|\ket{v_i}\| = 1 \text{, for all } 1 \leq i \leq n\\ \text{and } \braket{v_i | v_j} = 0 \text{ for all } 1 \leq i,j \leq n \text{ with } i \neq j \end{gather*}\]

All vectors in the collection are unit length and are mutually orthogonal. These are the nice properties (in the sense that they make calculations easy) that the standard bases we are used to using have, such as \(\begin{bmatrix} 1\\0\\0 \end{bmatrix}, \begin{bmatrix} 0\\1\\0\end{bmatrix}, \begin{bmatrix}0\\0\\1\end{bmatrix}\). In this course we will assume that all bases are orthonormal. This assumption is fine for us to make since if we are given any basis that is not orthonormal we can always turn it into one via the Gram-Schmidt process (link to G-S in appendix).

Outer Products #

Suppose we have two inner product spaces (a vector space with an inner product defined) called \(V\) and \(W\) (they may be the same).

Suppose that \(\ket{v}\) is a vector in \(V\) and \(\ket{w}\) is a vector in \(W\). We may they take their inner product denoted \(\ket{w}\bra{v}\) which is a linear operator from \(V\) to \(W\). Recall, that means it acts on vectors in \(V\) and takes them to a vector in \(W\). For \(\ket{x}\) in \(V\) it is defined by:

\[\begin{gather*} \ket{w}\bra{v}\ket{x} = \ket{w}\braket{v|x} \end{gather*}\]

We take the inner product of \(\ket{v}\) and \(\ket{x}\) which produces a number, and then we multiply \(\ket{w}\) by it. Note that we could have very well specified the result as \(\braket{v|x}\ket{w}\) as simple multiplication by a number can be done in any order, but the above highlights the usefulness of the Dirac notation.

Outer products may also be represented by matrices (since they are linear operators) and we have a nice way to calculate them. Let \(\ket{v} = \begin{bmatrix} v_1\\ v_2\\ \vdots\\ v_n \end{bmatrix}\) and \(\ket{w} = \begin{bmatrix} w_1\\ w_2\\ \vdots\\ w_m\end{bmatrix}\). Then \(\ket{w}\bra{v} = \ket{w} \ket{v}^\dagger\) is given by:

\[\begin{gather*} \begin{bmatrix} w_1\\ w_2\\ \vdots\\ w_m \end{bmatrix} \begin{bmatrix} v_1^* & v_2^* & \cdots & v_n^* \end{bmatrix} = \begin{bmatrix} w_1v_1^* & w_1v_2^* & \cdots & w_1v_n^*\\ w_2v_1^* & w_2v_2^* & \cdots & w_2v_n^*\\ \vdots & \vdots & \cdots & \vdots\\ w_mv_1^* & w_mv_2^* & \cdots & w_mv_n^* \end{bmatrix} \end{gather*}\]

If \(\ket{x} = \begin{bmatrix} x_1\\ x_2\\ \vdots\\ x_n \end{bmatrix}\) then we see \(\ket{w}\bra{v}\ket{x}\) is given by:

\[\begin{gather*} \begin{bmatrix} w_1v_1^* & w_1v_2^* & \cdots & w_1v_n^*\\ w_2v_1^* & w_2v_2^* & \cdots & w_2v_n^*\\ \vdots & \vdots & \cdots & \vdots\\ w_mv_1^* & w_mv_2^* & \cdots & w_mv_n^* \end{bmatrix}\begin{bmatrix} x_1\\ x_2\\ \vdots\\ x_n \end{bmatrix} = \begin{bmatrix} w_1v_1^*x_1 + w_1v_2^*x_2 + \cdots + w_1v_n^*x_n\\ w_2v_1^*x_1 + w_2v_2^*x_2 + \cdots + w_2v_n^*x_n\\ \vdots\\ w_mv_1^*x_1 + w_mv_2^*x_2 + \cdots + w_mv_n^*x_n \end{bmatrix}\\ = \begin{bmatrix}w_1\\ w_2\\ \vdots\\ w_m \end{bmatrix}\braket{v | x} \end{gather*}\]

Determinants #

The determinant is a function whose domain is square matricies and whose range is scalars. In other words, the determinant assigns a number to a square matrix. We will show how to compute the determinant of smaller matricies and then generalize the computation to any \(n \times n\) matrix.

Given a matrix \(A\) the determinant of \(A\) is denoted by \(\det{(A)}\) or \(|A|\).

For a \(2 \times 2\) matrix:

\[\begin{gather*} A = \begin{bmatrix} a & b\\ c & d\end{bmatrix} \end{gather*}\]

We multiply the entries on the main diagonal and subtract from it the product of the elements on the off diagonal so that:

\[\begin{gather*} \det{(A)} = ad-bc \end{gather*}\]

Given a \(3 \times 3\) matrix:

\[\begin{gather*} A = \begin{bmatrix} a_{11} & a_{12} &a_{13}\\ a_{21} & a_{22}& a_{23}\\ a_{31} & a_{32} & a_{33} \end{bmatrix} \end{gather*}\]

We first mentally attach to the element \(a_{ij}\) in the \(i\)th row and \(j\)th column a \(+\) or \(-\) based on its position, according to the rule \((-1)^{i+j}\):

\[\begin{gather*} \begin{bmatrix} +a_{11} & -a_{12} &+a_{13}\\ -a_{21} & +a_{22}& -a_{23}\\ +a_{31} & -a_{32} & +a_{33} \end{bmatrix} \end{gather*}\]

We then need to choose a row or a column to expand with; we will get the same answer no matter which we choose but to make the calculation as simple as possible we should choose the row or column with the most zeros. Suppose we choose the first row.

If \(\bar{A}_{ij}\) denotes the \(2\times2\) matrix formed by deleting the \(i\)th row and \(j\)th column. Then we can calculate:

\[\begin{gather*} \det{(A)} = a_{11} \det{(\bar{A}_{11})}-a_{12}\det{(\bar{A}_{12})}+a_{13}\det{(\bar{A}_{13})} \end{gather*}\]

We take our chosen row or column with the appropriate sign adjustment as the coefficients of the sum of smaller \(2 \times 2\) determinants over the matricies formed by deleting the row and column of the original matrix that contains the coefficient.

We can generalize the above to any \(n \times n\) matrix by using recursion. Define the determinant of a \(1 \times 1\) matrix to be its only entry. For \(n \geq 2\):

\[\begin{gather*} \det(A) = \sum_{j=1}^n (-1)^{1 + j}A_{1j}\cdot \det(\bar{A}_{1j}) \end{gather*}\]

Which utilizes the first row. If we would like to use the \(i\)th row then we need only change the sum to:

\[\begin{gather*} \sum_{j=1}^n (-1)^{i + j}A_{ij}\cdot \det(\bar{A}_{ij}) \end{gather*}\]

Where \(i\) is fixed. And if we wanted to instead expand by the \(j\)th column we could change to:

\[\begin{gather*} \sum_{i=1}^n (-1)^{i + j}A_{ij}\cdot \det(\bar{A}_{ij}) \end{gather*}\]

Where \(j\) is now a fixed number.

The determinant of an upper or lower triangular matrix is very easy to compute, it is simply the product of the diagonal entries so that:

\[\begin{gather*} \det\left(\begin{bmatrix} a_1 & a_2 & a_3\\ 0 & b_2 & b_3\\ 0 & 0 & c_3\end{bmatrix} \right) = a_1 b_2 c_3 \end{gather*}\]

Transposing a matrix does not change its determinant:

\[\begin{gather*} \det\left(A \right) = \det\left(A^T \right) \end{gather*}\]

Likewise for the conjugate transpose:

\[\begin{gather*} \det\left(A \right) = \det\left(A^\dagger \right) \end{gather*}\]

Eigenvalues and Eigenvectors #

Let \(A\) be a linear operator on some vector space \(V\). A vector \(\ket{v} \neq 0\) in \(V\) is an Eigenvector of \(A\) if applying \(A\) to it returns a scaled version of the vector:

\[\begin{gather*} A\ket{v} = \lambda \ket{v} \end{gather*}\]

Where the scalar \(\lambda\) is called the associated eigenvalue of \(\ket{v}\).

We can determine the eigenvalues of a linear transformation \(A\) by first representing \(A\) as a square \(n\times n\) matrix which we will also call \(A\). We then calculate \(f(t)\), the characteristic polynomial of \(A\). This is found by:

\[\begin{gather*} f(\lambda) = \det(A-\lambda I) \end{gather*}\]

Where \(I\) is the \(n\times n\) identity matrix. We then set \(f(\lambda) = 0\) and solve for \(\lambda\). The resulting numbers we get are the eigenvalues of \(A\).

To then find the eigenvectors associated with a specific eigenvalue \(\lambda\) we find the nullspace of \(A-\lambda I\) excluding \(0\).

Note that because \(A\) is linear, if \(c\) is a constant and \(\ket{v}\) is an eigenvector of \(A\) with associate eigenvalue \(\lambda\) then since \(A\ket{v} = \lambda \ket{v}\):

\[\begin{gather*} A\ket{cv} = c A\ket{v} = c \lambda \ket{v} = \lambda \ket{cv} \end{gather*}\]

\(c\ket{v}\) is also an eigenvector with the same eigenvalue \(\lambda\). In other words, multiples of eigenvectors are still eigenvectors.

There is a quick check we can do that will often tell us if we’ve incorrectly calculated the eigenvalues of \(A\). If we add all of the eigenvalues of \(A\) together the sum must equal the sum of the diagonal entries of \(A\), for shorthand called the trace or tr\((A)\) of \(A\).