Linear Algebra#
Although this course does not assume you have a strong physics background, a solid grounding in linear algebra is essential. These notes cover the important material you will be expected to understand deeply before participating in the course. If anything is unfamiliar, please refresh your understanding prior to the course, or ask one of the instructors for a review.
Please see Linear Algebra for additional resources, including and excellent set of videos.
Terminology#
We will freely use the following terminology throughout these notes and in the course. You will be expected to understand what these terms mean, and the implications. For example, you should know that if the linear operator \(\op{H}\) is Hermitian, then its eigenvalues are real, and its eigenvectors can be chosen to form a complete, orthonormal basis. In equations:
Some of these terms will be described here, but you may need to refer elsewhere to complete your understanding. We will restrict ourselves to finite-dimensional vector spaces in this course, so some of the terminology will be redundant (i.e. for finite dimensional spaces, Hermitian and self-adjoint mean the same thing)
Set.
Vector, column vector.
Co-vector, row vector.
Dyad.
Vector space (complex).
Subspace.
Rank.
Linear and anti-linear operator.
Inner product \(\braket{a|b}\).
Basis (pl. bases).
Components.
Complete.
Orthogonal.
Orthonormal.
Transpose \(\mat{A}^T\) and Hermitian conjugate \(\mat{A}^\dagger = (\mat{A}^{T})^*\).
Symmetric and anti-symmetric, Hermitian, self-adjoint.
Orthogonal matrix, unitary matrix.
Singular value decomposition (SVD): \(\mat{A} = \mat{U}\mat{D}\mat{V}^\dagger\).
QR decomposition \(\mat{A} = \mat{Q}\mat{R}\). I.e. Gram-Schmidt orthogonalization.
LR decomposition \(\mat{A} = \mat{L}\mat{U}\). I.e. Gaussian elimination.
Trace and determinant.
Eigenvectors and eigenvalues.
Tensor product.
Cartesian product.
Partial trace.
Notation#
We will use the following “bra-ket” or “braket” notation due to Dirac, which is ubiquitous and very useful in quantum mechanics.
\(\ket{\psi}\): An abstract vector or simply, a vector. This represents a physical object: think of a physical arrow in space, in contradistinction to a “list of numbers”. Although one can treat a “list of numbers” as an abstract column vector, this is better thought of as the components of an abstract vector in a particular basis. This distinction causes quite a bit of confusion, and is the focus of the section Vectors and Components below.
\(\bra{\psi}\): An abstract co-vector or simply, a co-vector. This is a linear operator that takes vectors into numbers, and is the Hermitian conjugate of the vector \(\ket{\psi}\):
\[\begin{gather*} \bra{\psi} = \ket{\psi}^\dagger. \end{gather*}\]Although one can think of \(\bra{\psi}\) as a row vector, it is better to think \(\bra{f}\) as a linear operator that takes vectors into numbers.
\(\braket{f|g}\): Inner product. Much of the power of Dirac’s notation comes from the expression of the inner product as an operator product of bras and kets. This inner product can be thought of as the dot product of a row vector and column vector.
\(\norm{\ket{\psi}} = \sqrt{\braket{\psi|\psi}}\): Norm or \(L_2\) norm. We will only make use of the \(L_2\) norm in this course:
\[\begin{gather*} \norm{\ket{\psi}} = \sqrt{\sum_{n} \abs{\psi_n}^2}. \end{gather*}\]\(\{\ket{n}\}\): Orthonormal basis. More carefully, we should specify the number of basis vectors such as:
\[\begin{gather*} \{\ket{n} \;|\; n=0, 1, \dots, N-1\} = \{\ket{0}, \ket{1}, \dots, \ket{N-1}\}. \end{gather*}\]We will assume that all bases have been properly orthogonalized an normalized so that
\[\begin{gather*} \braket{m|n} = \delta_{nm} = \begin{cases} 1 & m = n,\\ 0 & \text{otherwise}, \end{cases} \end{gather*}\]where \(\delta_{mn}\) is the [Kronecker-delta]. This assumption greatly simplifies the notation of things like the components of a vector (see below). When working with concrete vectors, then you may need to first orthonormalize your bases using the Gram-Schmidt procedure, or (equivalently) the QR decomposition.
\(\psi_n = \braket{n|\psi}\): The components of the vector \(\ket{\psi}\) in the orthonormal basis \(\{\ket{n}\}\). Here \(\bigl\{\psi_n \; |\; n\in\{0, 1, \dots, N-1\}\bigr\}\) is the “list of numbers” version of a vector you might be used to from elementary linear algebra courses.
Vectors and Components#
Students sometimes have difficulty working with quantum mechanics because of confusion between a vector and the components of a vector. To be clear, when we write a vector \(\vect{v} \equiv \ket{v}\) as:
we really have in mind a standard orthonormal basis \(\{\uvect{i}=\ket{i}, \uvect{j}=\ket{j}, \uvect{k}=\ket{k}\}\)
such that
Note
In my work, I try to write this as
Although not common, this helps me keep track of things. The reason is that I think of \(\ket{i}\) etc. as column vectors – matrices with shape \(N\times 1\) – and I think of the coefficients (numbers) as \(1\times 1\) matrices. Thus, I can think of \(\ket{i}1\) as matrix multiplication between a \(N\times 1\) matrix and a \(1 \times 1\) matrix, which makes sense in terms of the dimensions, and gives an \(N \times 1\) matrix as a result.
Similarly, I think of the long bar on the left side of the ket as corresponding to the first index of the matrix that is “long” in the sense that it takes \(N\) values, while the point on the right corresponds second index which takes only \(1\) value.
Thus, when I look at \(\braket{f|g}\), the points on both sides tell me that this is a \(1\times 1\) matrix, or a number, while the dyad \(\ket{f}\bra{g}\) has long bars on both sides, and so is clearly an \(N \times N\) matrix.
Following this idea, I write the eigenvalue problem as:
I think of the matrix \(\mat{H}\) moving “through” the ket from left to right, and turning into the eigenvalue \(\lambda\).
Thus, when I diagonalize a matrix \(\mat{H} = \mat{U}\mat{D}\mat{U}^\dagger\), I remember that this is really just a collection of eigenvalue problems where the columns of \(\mat{U}\) are the eigenvectors:
Mathematicians will insist that vectors \(\ket{v}\) be treated as abstract objects, and that the “list of numbers” only makes sense after one has specified a basis. Hence, expressions like the one above should be regarded with suspicion (hence the \(\=\) notation).
It is worthwhile developing a nose for such suspect relations, and keeping this in mind can help clarify what is going on:
For the purposes of this course, however, we shall always assume the existence of a standard orthonormal basis, for example, with the \(z\)-axis pointing “up”, the \(x\)-axis pointing “to the right” along, and the \(y\)-axis pointing “into the page” (or “into the board” in the classroom).
Thus, we shall freely write
without qualification \(\=\), with the knowledge that this description depends on our implicit choice of the standard orthonormal basis.
Discussion: Abstract vectors \(\ket{\psi}\) vs. lists of numbers – which is more fundamental?
Physicists often regard the quantum state \(\ket{\psi}\) as an abstract quantity representing physical reality, with the list of numbers having secondary meaning dependent on ones (arbitrary) choice of basis.
This position, however, is not completely tenable. As both Heisenberg and Einstein point out Fred: can you provide the references?, what really matters are the numbers which result from measurement. The physical theory simply provides some mathematical framework which describes how these numbers relate to each other in different circumstances. In this view, it is the numbers themselves that play the fundamental role, and hence, it is reasonable to consider the numbers themselves as the representation of the physics, with the choice of basis explicitly defined by the nature of the experiment.
In quantum mechanics, measurement is subtle, and we will learn that the components of a wavefunction cannot actually be measured, nevertheless, there is a reasonable argument that the abstract vector \(\ket{\psi}\) is no more fundamental that the components in a particular well-defined basis. We will use this argument to favour the simplified notation.
Special Matrices#
In quantum mechanics, two general classes of matrices play a central role: hermitian matrices and unitary matrices. These are the complex generalizations of symmetric and orthogonal matrices respectively.
Hermitian Matrices#
Hermitian matrices are square matrices that are the same as their hermitian conjugate:
E.g., for a 3×3 matrix:
thus, we can write the most general 3×3 hermitian matrix as:
where \(a, b, \dots, k\) are real. In particular, note that the diagonals are real.
For quantum mechanics, there are two key properties of hermitian matrices:
Their eigenvalues are real: \(\lambda_i \in \mathbb{R}\).
Their eigenvectors can be chosen to form a complete orthonormal basis.
Do it! Prove these statements.
First suppose that \(\ket{1}\) is an eigenvector:
Hence, if \(\mat{H} = \mat{H}^{\dagger}\) is hermitian, then
Thus \(\lambda_1 = \lambda_1^*\) is real. This holds for any eigenvalue of \(\mat{H}\)
Now we present a simplified proof in the case that the eigenvectors are distinct. Suppose that
Then, since \(\lambda_2^* = \lambda_2\) is real:
Since \(\lambda_1 \neq \lambda_2\), the only way for this to be true is for \(\braket{1|2} = 0\). I.e. the eigenvectors \(\ket{1}\) and \(\ket{2}\) are orthogonal.
In quantum mechanics, physical quantities are associated with matrices as follows:
Physical systems are represented by normalized state vectors \(\ket{\psi}\) with \(\braket{\psi|\psi} = 1\). Physical quantities are represented by hermitian matrices, e.g. \(\mat{H} = \mat{H}^\dagger\). The result of measuring \(\mat{H}\) for a state \(\ket{\psi}\) is one of the eigenvalues \(\lambda_n\) of \(\mat{H}\) with probability \(p_n = \braket{n|\psi}\). After obtaining a measurement of \(\lambda_n\), the physical system will be left in the corresponding eigenstate \(\ket{n}\).
By restricting physical quantities to be associated with hermitian matrices, we ensure that measured values are real, and that all events are properly accounted for with unit probability \(1 = p_1 + p_2 + \cdots\) of measuring at least one of the outcomes. (Note: there are extensions of quantum mechanics that consider non-hermitian matrices that have become quite popular recently. This could form the basis for an interesting project if someone is interested. These generally represent “open” systems where particles can leave, so the total probability of measuring one of the outcomes may be less than 1 for example.)
Positive Operator#
A positive operator (or positive semi-definite matrix) is an operator (matrix) \(\mat{A}\) for which
is real and non-negative for all vectors \(\ket{psi}\) in the space. If the vector space is complex, then \(\mat{A} = \mat{A}^\dagger\) is hermitian.
Do it! Prove this statement.
This is Exercise (2.24) in [Nielsen and Chuang, 2010]. As a start, consider the following matrix:
Show that the two eigenvalues are both \(1\).
Show that \(\mat{A}\) is positive over all real vectors by considering
\[\begin{gather*} \ket{\psi} = \begin{pmatrix} \cos(\theta)\\ \sin(\theta) \end{pmatrix} \end{gather*}\]Find a complex vector \(\ket{\psi}\) where the expectation value \(\braket{\psi|\mat{A}|\psi}\) is not real.
Follow the hint in Exercise (2.24) that every matrix \(\mat{A} = \mat{B} + \I\mat{C}\) can be expressed in this way where \(\mat{B}=\mat{B}^\dagger\) and \(\mat{C}=\mat{C}^\dagger\) are hermitian.
Unitary Matrices#
Unitary matrices are square matrices whose hermitian conjugate is their inverse:
The most important properties are:
The columns of a unitary matrix form a complete orthonormal basis.
The eigenvalues of a unitary matrix are phases \(e^{\I \theta}\).
The eigenvectors of a unitary matrix can be chosen to form a complete orthonormal basis.
\(\mat{U}\mat{U}^\dagger = \mat{1}\).
Do it! Prove these statements.
The first property is easy to show by partitioning the matrix. Let \(\ket{1}\) be the first column, \(\ket{2}\) the second, etc. Then \(\bra{1}\) forms the first row of \(\mat{U}^\dagger\), \(\bra{2}\) the second row, etc.:
Now simply compute the product:
Hence \(\braket{n|m} = \delta_{m,n}\) and the columns \(\ket{m}\) form a complete orthonormal basis.
Now assume \(\mat{U}\ket{1} = \ket{1}\lambda\). Then
Hence \(\abs{\lambda} = 1\) and so \(\lambda = e^{\I\theta}\) for some real angle \(\theta\).
The other properties are easier to prove after we relate hermitian and unitary matrices.
Unitary matrices play a central role in quantum mechanics because they preserve the norm of a state \(\ket{\psi}\), which is associated with conservation of probability. I.e. if \(\ket{\phi} = \mat{U}\ket{\psi}\) where \(\mat{U}\) is unitary, then
The propagator in quantum mechanics – the linear operator that takes one state into another state in the future – is a unitary matrix \(\mat{U}\). In general, we will represent the action of a quantum computer (without measurement) as a large unitary matrix \(\mat{U}\).
Projection Matrices#
Projection matrices also play an important role, providing a way of decomposing vectors into their components. Their key property is that they are [idempotent][]:
I.e. once you project a vector into a subspace, projecting it again does not change anything.
For example, suppose
where \(\ket{a}\) and \(\ket{b}\) are linearly independent unit vectors. Projection matrices \(\proj{a}\) and \(\proj{b}\) that act as
are useful:
where \(\proj[\perp]{a}\) is the projector into the subspace orthogonal to \(\ket{a}\) etc.
Do it! Derive projectors \(\proj{a}\) and \(\proj{b}\).
First, lets take \(\ket{a}\) and \(\ket{b}\) to be unit vectors \(\braket{a|a} = \braket{b|b} = 1\). If this is not the case, then redefine and the coefficients as
etc.
The key properties that we want are:
The second property can be ensured by using the projectors \(\proj[\perp]{a}\) and \(\proj[\perp]{b}\) into the subspaces that are orthogonal to \(\ket{b}\) and \(\ket{a}\) respectively. Similarly, we need the “output” of the operators to be along the appropriate direction. Thus, we can write
where we have yet to specify \(\bra{A}\) and \(\bra{B}\).
To have \(\proj{a}\ket{a} = \ket{a}\braket{A|\proj[\perp]{b}|a} = \ket{a}\), we need
This is enough to make the projectors idempotent, i.e.:
We can thus simply take (using the fact that \(\proj[\perp]{b}\) is idempotent and hermitian)
In higher dimensional spaces, one can add to \(\ket{A}\) and \(\ket{B}\) anything in the subspace orthogonal to \(\span\{\ket{a}, \ket{b}\}\).
Putting this all together, we have explicitly
The first form generalizes to any number of projectors by generalizing \(\proj[\perp]{b}\) to be the projection into the orthogonal subspace to all other vectors.
Note that if \(\braket{a|b} = 0\), then this reduces to the familiar form for orthogonal vectors \(\proj{a} = \ket{a}\!\bra{a}\).
We usually work with orthogonal vectors \(\ket{a|b} = 0\) or orthonormal vectors \(\braket{a|a} = 1\), which greatly simplifies the formula:
Note
If the vectors are not orthogonal, \(\braket{a|b} \neq 0\), then one must be careful because several properties enjoyed by orthogonal projects do not hold:
If the vectors are not orthogonal, then the projectors depend on the full set of vectors. I.e. our formula for \(\proj{a}\) depends on \(\ket{b}\). If the vectors are orthogonal, then this dependence vanishes.
If the vectors are not orthogonal, then the projectors are not hermitian, and therefore not positive operators on complex vector spaces.
For these reasons, it is usually preferably to change your metric so that your vectors become orthonormal.
To do.
Demonstrate how these projectors can be simply constructed using a different metric.
from qiskit.visualization import array_to_latex
def normalize(psi):
return np.divide(psi, np.linalg.norm(psi))
a = np.array([1, 0])
b = np.array([1, 1])
psi = 1.2*a - 3.4*b
bhat = normalize(b)
I = np.eye(2)
P_perp_a = I - np.outer(a, a)
P_perp_b = I - np.outer(bhat, bhat)
P_a = (np.outer(a, a) @ P_perp_b) / (a @ P_perp_b @ a)
P_b = (np.outer(b, b) @ P_perp_a) / (b @ P_perp_a @ b)
assert np.allclose(P_a, P_a @ P_a)
assert np.allclose(P_b, P_b @ P_b)
assert np.allclose(P_a @ psi, 1.2*a)
assert np.allclose(P_b @ psi, -3.4*b)
display(array_to_latex(P_a, prefix="\proj{a} = "))
display(array_to_latex(P_b, prefix="\proj{b} = "))
Matrix Exponential#
Hermitian matrices and unitary matrices are closely related through the matrix exponential and matrix logarithm. I.e. if \(\mat{H} = \mat{H}^{\dagger}\) is a hermitian matrix, then the matrix exponential of \(\I \mat{H}\) is unitary:
Stated another way, the matrix exponential of an anti-hermitian matrix is unitary.
Note
In quantum mechanics, a special hermitian matrix \(\mat{H}\) called the Hamiltonian represents the energy of the system and defines the Schrödinger equation
If the Hamiltonian \(\mat{H}\) does not depend on time, then the matrix exponential of this with an additional numerical factor of \(t/\I \hbar\) gives the explicit solution:
The unitary matrix \(\mat{U}_t\) is called the propagator and encodes the complete solution to the time-independent the Schrödinger equation where the Hamiltonian \(\mat{H}\) does not depend on time.
In quantum computing, we generally work directly with the unitary propagator \(\mat{U}_t\), which represents the action of our computer. The problem for physicists is to find the appropriate physical system to get a Hamiltonian \(\mat{H}\) that exponentiates to the appropriate \(\mat{U}_t\).
Although somewhat complicated in general to do efficiently, the problem of computing the matrix exponential of a hermitian matrix is quite straight forward. The idea works for any analytic function \(f(\mat{H})\) whose Taylor series converges:
This can in principle be applied to any matrix, but we can get a simple form for hermitian matrices by using the fact that the eigenvectors of the hermitian matrix \(\mat{H}\ket{n} = \ket{n}\lambda_n\) can be chosen to form a complete orthonormal basis \(\ket{n}\), we can write these as the columns of a unitary matrix \(\mat{V}\):
Using this, we can write:
Hint
where \(\mat{D}\) is diagonal
This is called diagonalization or we say that we are diagonalizing \(\mat{H}\). We can thus express any of the powers \(\mat{H}^n\) as:
Hence, we can write
by factoring the remaining factors of \(\mat{V}\) and \(\mat{V}^\dagger\) on the left and right of each term in the series.
We now see that the \(f(\mat{H})\) and \(\mat{H}\) have the same eigenvectors \(\ket{n}\) and that the eigenvalues of \(f(\mat{H})\) are just the \(f(\lambda_n)\). Thus, if \(\mat{H} = \mat{H}^\dagger\) is hermitian with real eigenvalues \(\lambda_n\), then the matrix exponential of \(\I\mat{H}\) has eigenvalues that are phases \(e^{\I\lambda_n}\).
Review#
Matrix Decompositions#
QR Decomposition#
Given a set of vectors \(\ket{a_0}\), \(\ket{a_1}\), etc., one can follow the Gram-Schmidt procedure to produce an orthonormal basis. To simplify the notation, we define the “normalization” operator \(\newcommand{\N}{\mathcal{N}}\N(\ket{a})\)
We can now express the Gram-Schmidt procedure as a sequence of steps projecting the remaining basis vectors into an orthogonal subspace, then
The QR decomposition numerically encodes the results of this process in terms of a unitary matrix \(\mat{Q}\) and an upper triangular matrix \(\mat{R}\):
Recall that the columns of a unitary matrix \(\mat{Q}\) form an orthonormal basis.
Singular Value Decomposition#
Any matrix \(\mat{A}\) can be decomposed into its singular value decomposition or SVD:
where \(\mat{U}\) and \(\mat{V}\) are unitary, and \(\mat{D}\) is diagonal with non-negative entries \(\sigma_n\) called the singular values.
Note: \(\mat{A}\) need not be square:
The SVD can be computed efficiently by finding the eigenvalues and eigenvectors of the two hermitian matrices \(\mat{A}^{\dagger}\mat{A}\) and \(\mat{A}\mat{A}^\dagger\) respectively.
Direct Sum and Tensor Product#
Consider two vectors \(\ket{a}\) (with \(N_a\) components) and \(\ket{b}\) (with \(N_b\) components). We can form two new vectors from these:
The direct sum \(\ket{a} \oplus \ket{b}\) which has \(N_a + N_b\) components.
The tensor product \(\ket{a} \otimes \ket{b}\) which has \(N_a N_b\) components.
The first can be visualized as stacking the vectors on top of each other:
The direct sum is useful when we decompose a vector space into subspaces, for example, when using projection matrices. The corresponding matrices are block diagonal:
The tensor product is a little more tricky to visualize:
It is easier to think of \(\ket{a}\otimes\ket{b}\) in terms of grouped indices:
We can generalize to matrices, first in index form:
Note the grouping carefully. Visually, this corresponds to
These satisfy the following properties:
We also talk about the tensor product of vector spaces, so that if \(\ket{a} \in \mathcal{H}_a\) and \(\ket{b} \in \mathcal{H}_b\), then \(\ket{a}\otimes\ket{b} \in \mathcal{H}_a \otimes \mathcal{H}_b\). If we have an orthonormal basis \(\{\ket{a_i}\}\) for \(\mathcal{H}_a\) and an orthonormal basis \(\{\ket{b_j}\}\) for \(\mathcal{H}_b\), then the basis \(\ket{ij} = \ket{a_i}\otimes\ket{b_j}\) is an orthonormal basis for \(\mathcal{H}_a\otimes \mathcal{H}_b\).
It is useful to be able to work with the tensor product numerically. The function
numpy.kron()
, named for Kronecker product, directly implements the tensor
product, but I find it clearer to explicitly work with indices using
numpy.einsum()
.
First note that we can “combine indices” using numpy.reshape()
. Thus, if we
have a matrix \(A_{i,j}:\)
this will “ravel” to a vector
although with a single index (so it appears flat):
A = np.array([
[1000, 1001, 1002],
[1010, 1011, 1012]])
display(A)
display(A.ravel())
array([[1000, 1001, 1002],
[1010, 1011, 1012]])
array([1000, 1001, 1002, 1010, 1011, 1012])
Here is the tensor product demonstrating the index ordering:
A = np.array([
[0, 1],
[2, 3]])
B = np.array([
[10, 20],
[30, 40]])
display(np.kron(A, B))
display(np.einsum('ij,kl->ikjl', A, B).reshape((4,4)))
array([[ 0, 0, 10, 20],
[ 0, 0, 30, 40],
[ 20, 40, 30, 60],
[ 60, 80, 90, 120]])
array([[ 0, 0, 10, 20],
[ 0, 0, 30, 40],
[ 20, 40, 30, 60],
[ 60, 80, 90, 120]])
Schmidt Decomposition#
The Schmidt decomposition is an important decomposition of a tensor product space which states that for any vector \(\ket{w} \in \mathcal{H}_a \otimes \mathcal{H}_b\) we can find a set of orthonormal bases \(\{\ket{u_i}\}\) and \(\{\ket{v_j}\}\) for \(\mathcal{H}_a\) and an \(\mathcal{H}_b\) respectively such that
where \(\min(m, n)\) is the dimension of the smaller space \(m = \dim\mathcal{H}_a\) or \(n = \dim\mathcal{H}_b\) respectively.
Do it! Prove this using the SVD.
First express \(\ket{w}\) in some orthonormal basis
Use the fact that any matrix has a SVD to factor the matrix of coefficients \(\beta_{ij}\) and use this to explicitly construct the bases \(\{\ket{u_i}\}\) and \(\{\ket{v_j}\}\) and coefficients \(\alpha_i = \sigma_i\) that provides the Schmidt decomposition.