---
jupytext:
  formats: ipynb,md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.14.0
kernelspec:
  display_name: Python 3 (phys-555)
  language: python
  name: phys-555
---

```{code-cell} ipython3
:tags: [hide-cell]

import mmf_setup;mmf_setup.nbinit()
import logging;logging.getLogger('matplotlib').setLevel(logging.CRITICAL)
%matplotlib inline
import numpy as np, matplotlib.pyplot as plt
```

(sec:WeakMeasurements)=
# Weak Measurements

The generalized {ref}`sec:Measurement` formalism allows for a simplified discussion of
[weak measurement][]s.  The idea is to introduce a set of [Kraus operators][]
$\{\mat{M}_{m}\}$ that are in some sense close to the identity $\mat{M}_{m} \approx
a\mat{1}$ (with a proportionality factor) such that the measurements do not completely
collapse the state.  Here we consider a simple example and explore how [weak
measurement][]s can be used to advantage when interrogating quantum systems.

## Single Qubit Example

Consider the following weak measurements of $\mat{Z}$:
\begin{gather*}
  M_\alpha = \{\mat{M}_{+1}, \mat{M}_{-1}\},\\
  \mat{M}_{+1} = \frac{1}{\sqrt{1+\alpha^2}}\begin{pmatrix}
    1\\
    & \alpha
  \end{pmatrix}, 
  \qquad
  \mat{M}_{-1} = \frac{1}{\sqrt{1+\alpha^2}}\begin{pmatrix}
    \alpha\\
    & 1
  \end{pmatrix},\\
  \mat{E}_{+1} = \frac{1}{1+\alpha^2}\begin{pmatrix}
    1\\
    & \alpha^2
  \end{pmatrix}, 
  \qquad
  \mat{E}_{-1} = \frac{1}{1+\alpha^2}\begin{pmatrix}
    \alpha^2\\
    & 1
  \end{pmatrix},\\
  \mat{E}_{\pm 1} = \tfrac{1}{2}(\mat{1} \pm \epsilon\mat{Z}), \qquad
  \epsilon = \frac{1 - \alpha^2}{1+\alpha^2}, \qquad
  \alpha^2 = \frac{1 - \epsilon}{1+\epsilon},
\end{gather*}
where we call $\epsilon$ the **weakness parameter**.


:::{admonition} Do it! Implement these using projective measurements.
:class: dropdown

These can be implemented as strong measurements of the following operators, which make
use of a single qubit ancillae.  First define a unitary operator $\mat{U}$ that behaves
as follows
\begin{gather*}
  \mat{U}\ket{\psi}\ket{0} = \sum_{m=\pm} \mat{M}_{m}\ket{\psi}\ket{m} \tag{2.122}.  
\end{gather*}

:::


If $\epsilon = 1$ ($\alpha = 0$), these correspond to the standard von Neumann
measurement of $\op{Z}$, but as $\epsilon \rightarrow 0$ ($\alpha \rightarrow 1$), they
provide less and less information about the state, 
but also cause less of a collapse.  In the extreme limit $\epsilon = 0$, we have
$\mat{M}_{+1} = \mat{M}_{-1} = \mat{1}/\sqrt{2}$, and the measurement leaves the state
undisturbed, while providing absolutely no information.

To visualize what these measurements tell us, consider an initial state
\begin{gather*}
  \ket{\theta} = \cos\tfrac{\theta}{2}\ket{0} + \sin\tfrac{\theta}{2}\ket{1}
\end{gather*}
with $\theta \in [0, \pi]$.  Measuring this state gives $m \in \{+1, -1\}$ with
probabilities
\begin{gather*}
  p_\epsilon(+1|\theta) 
    = \frac{\cos^2\tfrac{\theta}{2} + \alpha^2\sin^2\tfrac{\theta}{2}}{1+\alpha^2}
    = \frac{1}{2}\left(1 + \epsilon\cos\theta\right),\\
  p_\epsilon(-1|\theta) 
    = \frac{\alpha^2\cos^2\tfrac{\theta}{2} + \sin^2\tfrac{\theta}{2}}{1+\alpha^2}
    = \frac{1}{2}\left(1 - \epsilon\cos\theta\right).
\end{gather*}
More succinctly:
\begin{gather*}
  p_\epsilon(m|\theta) = \tfrac{1}{2}(1 + m\,\epsilon\cos\theta)
\end{gather*}

What about the action of the measurement on the state?  According to the postulate of
generalized measurements, $\ket{\psi} \rightarrow \mat{M}_{m}\ket{\psi}/\sqrt{p_m}$.
Thus:
\begin{gather*}
  \mat{M}_{+1}\ket{\theta} \propto \begin{pmatrix}
    \cos\tfrac{\theta}{2}\\
    \alpha\sin\tfrac{\theta}{2}
  \end{pmatrix}, \qquad
  \mat{M}_{-1}\ket{\theta} \propto \begin{pmatrix}
    \alpha\cos\tfrac{\theta}{2}\\
    \sin\tfrac{\theta}{2}
  \end{pmatrix}.
\end{gather*}
Since these $\mat{M}_{\pm 1}$ commute (they are both diagonal), we can succinctly
describe a series of $n_{+}$ measurements with value $m=+1$ and $n_{-}$ measurements
with value $m=-1$:
\begin{gather*}
  \mat{M}_{+1}^{n_{+}}\mat{M}_{-1}^{n_{-}}\ket{\theta} 
  \propto
  \begin{pmatrix}
    \alpha^{n_-}\cos\tfrac{\theta_n}{2}\\
    \alpha^{n_+}\sin\tfrac{\theta_n}{2}
  \end{pmatrix}
  \propto
  \begin{pmatrix}
    \cos\tfrac{\theta_{n_+-n_-}}{2}\\
    \sin\tfrac{\theta_{n_+-n_-}}{2}
  \end{pmatrix} = \ket{\theta_{n_{+}-n_{-}}}, \\
  \tan\tfrac{\theta_{n_{+}-n_{-}}}{2} 
    = \frac{\alpha^{n_{+}}}{\alpha^{n_{-}}}\tan\tfrac{\theta}{2} 
    = \alpha^{n_{+}-n_{-}}\tan\tfrac{\theta}{2}.
\end{gather*}

:::{margin}
This can be generalized quite trivially to include measurements towards $\mat{Y}$ but
let's keep things real for now.
:::
Generalizing slightly and using rotational invariance, we will also consider
measurements at an angle $\vartheta$ -- i.e. $\vartheta=\pi/2$ corresponds to a weak
measurement of $\mat{X}$ -- with probabilities:
\begin{gather*}
  p_{\epsilon,\vartheta}(m|\theta) 
  = p_{\epsilon}(m|\theta-\vartheta) 
  = \frac{1}{2}\left(
      1 + m,\epsilon\cos(\theta-\vartheta)
  \right),\\
  \mat{M}_{m}(\vartheta)\ket{\theta} \propto \ket{\theta_{m}}, \qquad
  \tan\tfrac{\theta_{m}-\vartheta}{2} = \alpha^{m}\tan\tfrac{\theta-\vartheta}{2}.
\end{gather*}

Thus, the weak measurement has the effect of marching the state towards the
measurement eigenstates, but with decreasing step size.

:::{margin}
Are the properties of such a random walk known?  This is probably discussed in detail in
{cite:p}`Oreshkov:2005`.
:::
### Random Walk on Bloch Sphere

:::{margin}
Some useful relations:
\begin{gather*}
  x = -\ln \tan\tfrac{\theta}{2},\\
  \tanh x = \cos \theta = \frac{1-\tan^2\tfrac{\theta}{2}}
                               {1+\tan^2\tfrac{\theta}{2}}.
\end{gather*}
:::
To make this behavior a little more explicit, it is useful to introduce some new variables:
\begin{gather*}
  x = -\ln \tan\tfrac{\theta}{2}, \qquad
  \delta = -\ln \alpha = \tanh^{-1}\epsilon.
\end{gather*}

```{code-cell} ipython3
:tags: [hide-cell]

# Check some identities
theta = np.linspace(0, np.pi)[1:-1]
x = -np.log(np.tan(theta/2))
t2 = np.tan(theta/2)**2
assert np.allclose(np.tanh(x), np.cos(theta))
assert np.allclose(np.tanh(x), (1-t2)/(1+t2))

alpha = 0.123
eps = (1-alpha**2)/(1+alpha**2)
assert np.allclose(alpha**2, (1-eps)/(1+eps))
assert np.allclose(-np.log(alpha), np.arctanh(eps))
```

In terms of these variables, the evolution under these weak measurements is exactly that
of a random walk with step size $\delta$
\begin{gather*}
  x \mapsto x + m\delta
\end{gather*}
where $m = \pm 1$ is a random variable with distribution that depends on $x$:
\begin{gather*}
  p_\delta(m|x) 
    = \tfrac{1}{2}(1 + m\,\epsilon\,\tanh x)
    = \tfrac{1}{2}(1 + m\tanh\delta\,\tanh x).
\end{gather*}

### Information and Bayes' Theorem

:::{margin}
I.e., a complete lack of knowledge would be characterized by a "flat prior":
\begin{gather*}
  p(\theta) = \frac{1}{\pi},
\end{gather*}
while complete knowledge would be characterized by a delta-function
\begin{gather*}
  p(\theta) = \delta(\theta-\theta_0).
\end{gather*}
:::
We now ask the question: What can we learn about a state by performing measurements.  To
make this definite, suppose we are given a quantum state $\ket{\theta}$ known to lie in
the $x-z$ plane, but with some prior knowledge coded in the prior distribution
$p(\theta)$.

:::{margin}
The denominator in Bayes' theorem $p(m)$, sometimes called the [marginal probability][]
can simply be regarded here as a normalization factor.  Once we compute the posterior,
we integrate and divide by the result.
:::
After making a measurement with outcome $m$, we can characterize what we have learned by
using [Bayes' theorem][] to get the posterior distribution
\begin{gather*}
  p_\epsilon(\theta|m) \propto p_\epsilon(m|\theta)p(\theta),
\end{gather*}
where $p_\epsilon(m|\theta)$ is the [likelihood][] of measuring $m$ if the state is
$\ket{\theta}$.

Now consider a sequence of measurements giving results $\vect{m} = [m_1, m_2, \cdots]$.
Our posterior changes as:
\begin{align*}
  p_{\epsilon;\vartheta_1}(\theta|m_1) 
    &\propto p_{\epsilon,\vartheta_1}(m_1|\theta)p(\theta),\\
  p_{\epsilon;\vartheta_1,\vartheta_2}(\theta|m_1,m_2) 
    &\propto p_{\epsilon,\vartheta_2}(m_2|\theta_1)p_{\epsilon;\vartheta_1}(\theta|m_1),\\
  p_{\epsilon;\vartheta_1,\vartheta_2,\vartheta_3}(\theta|m_1,m_2,m_3) 
    &\propto p_{\epsilon,\vartheta_3}(m_3|\theta_2)p_{\epsilon;\vartheta_1,\vartheta_2}(\theta|m_1,m_2),\\
  &\;\;\vdots\\
  p_{\epsilon;\vec{\vartheta}}(\theta|\vect{m}) 
    &\propto p(\theta)\prod_{i=1}^{n} p_{\epsilon,\vartheta_i}(m_i|\theta_{i-1}).
\end{align*}
:::{margin}
E.g., for fixed $\vartheta_n=\vartheta$,
\begin{gather*}
  \tan\tfrac{\theta_n}{2} = \alpha^{\sum_{i=1}^{n}m_i}\tan\tfrac{\theta}{2}.
\end{gather*}
:::
where $\theta_n$ characterizes the evolution of the state.

As a specific example, first consider a strong measurement along $\vartheta = 0$
yielding the result $m=\pm 1$ with a uniform prior $p(\theta)=1/\pi$.  The normalized
posteriors are:
\begin{gather*}
  p(\theta|m) = \frac{1 + m\cos(\theta)}{\pi}.
\end{gather*}

```{code-cell} ipython3
:tags: [hide-input]

# See how measuring successive m=0 results constraints the distribution

N = 200
thetas = np.linspace(0, np.pi, 2*N+1)[1:-1:2]
    
prior = 0*thetas + 1/np.pi
ps = [(1 + m*np.cos(thetas))/np.pi for m in [+1, -1]]

fig, ax = plt.subplots()

ax.plot(thetas, prior, ':k', label="prior")
ax.plot(thetas, ps[0], '-C0', label=r"$m=+1$")
ax.plot(thetas, ps[1], '-C1', label=r"$m=-1$")

ax.set(xlabel=fr"$\theta$", 
       ylabel=r"Posterior")
ax.legend();
```

Now, suppose we start with state $\ket{\theta=0} = \ket{0}$ (but we do
not know this... we still assume our prior is uninformed $p(\theta) = 1/\pi$).  In this
particular case, a sequence of measurements will always give $m = +1$.  We must,
however, still evolve $\theta_n$ in our likelihood function: $\tan\tfrac{\theta_{n}}{2} = \alpha^{n}\tan\tfrac{\theta}{2}$.

:::{dropdown} To Do.
The sympy code below shows that the product remains of the form $(a+\epsilon
b\cos\theta)$, but I have not found a simple analytic way of computing the partial terms
yet.  Maybe these help?

\begin{gather*}
  \cos\theta_n = \frac{1-\alpha^{2n}\tan^2\tfrac{\theta}{2}}
                      {1 + \alpha^{2n}\tan^2\tfrac{\theta}{2}}\\
               = \frac{1-\alpha^{2n} + (1+\alpha^{2n})\cos\theta}
                      {1+\alpha^{2n} + (1-\alpha^{2n})\cos\theta},\\
  p_{\epsilon;\vect{0}}(\theta|\vect{1}) 
  \propto p(\theta)\prod_{i=1}^{n} \Bigl(1 + \epsilon \tanh(x + i\delta)\Bigr)
\end{gather*}
:::

```{code-cell} ipython3
:tags: [hide-cell]

import sympy
eps, c = sympy.var('epsilon, c')
n = sympy.var('n', integer=True, positive=True)
a2 = (1-eps)/(1+eps)
res = (1-a2**n + (1+a2**n)*c)/(1+a2**n + (1-a2**n)*c)
ps = [(1+eps*res.subs(n, _n)).simplify() for _n in range(10)]
Ps = [ps[0]]
for _n in range(1, len(ps)):
    Ps.append((Ps[-1] * ps[_n]).simplify())
    display(Ps[-1])
```

Here are the results of performing a series of weak measurements in this case:

```{code-cell} ipython3
:tags: [hide-input]

# See how measuring successive m=0 results constraints the distribution

N = 200
thetas = np.linspace(0, np.pi, 2*N+1)[1:-1:2]

def likelihood(theta, epsilon, m=1):
    """Return the likelihood of measuring m, given a state |theta>"""
    if m == -1:
        epsilon *= -1
    return (1 + epsilon*np.cos(theta))/2
    
def posterior(prior, epsilon, theta=thetas, m=1):
    """Return the posterior distribution.
    
    Arguments
    ---------
    prior : array-like
        Prior distribution on theta in the initial state.
    epsilon : float
        Weakness parameter.
    theta : float or array-like
        Current values of theta.  If several measurements have been performed, then the
        final value of theta will differ from the original uniform thetas. This should
        be used in this case to update the change in state.
    """
    posterior = likelihood(theta=theta, epsilon=epsilon, m=m)*prior
    posterior /= np.trapz(posterior, thetas)
    return posterior

ps = [0*thetas + 1/np.pi]
theta = thetas

epsilon = 0.2
alpha = np.sqrt((1-epsilon)/(1+epsilon))

for n in range(10):
    ps.append(posterior(prior=ps[-1], epsilon=epsilon, theta=theta, m=0))
    theta = 2 * np.arctan(alpha * np.tan(theta/2))
    

fig, ax = plt.subplots()
for n, p in enumerate(ps):
    ax.plot(thetas, p, ':', label=f"$n={n}$")

p0 = posterior(ps[0], epsilon=1, theta=thetas, m=0)
ax.plot(thetas, p0, '-k', label=r"$\epsilon=1$")
p1 = posterior(ps[0], epsilon=1, theta=thetas, m=1)
ax.plot(thetas, p1, '-k', label=fr"$\epsilon=1$, $H(p‖p_0)={H(p1, p0):.2f}$")
ax.set(xlabel=fr"$\theta$", 
      ylabel=r"Posterior $p(\theta)$ after $n$ measurements",
      title=fr"$\epsilon={epsilon}$")
ax.legend();
```

We see that a sequence of weak measurements approaches the result of a single
strong measurement (or a sequence of strong measurements since a sequence is equivalent
to a single strong measurement).  Thus, in this case, weak measurements provide no
advantage over a strong measurement.

:::{margin}
Information theory for discrete variables is rooted in the Shannon entropy $H(X)$:
\begin{gather*}
  H(X) = -\sum_x p_x\log_2 p_x, \tag{11.1}\\
       = \braket{-\log_2 p(X)}.
\end{gather*}
Here we describe a continuous distribution $p(\theta)$, so the corresponding quantity is
the **[differential entropy][]** $h(X)$:
\begin{gather*}
  h(X) = -\int p(x)\ln p(x)\d{x},\\
       = \braket{-\ln p(X)}.
\end{gather*}
To relate these, discretize the variable $X$ into $N=2^n$ bins (sometimes called the $n$-bit
quantization) of size $\d{X} = \Delta$, then as $\Delta \rightarrow 0$:
\begin{gather*}
  H(X^\Delta) + \log \Delta \rightarrow h(X).
\end{gather*}
(Theorem 8.3.1 from {cite:p}`Cover:2006`)


:::
## Information Theory I

How might we characterize: "how much we learn" by making these measurements?  One way is
to use the von Neumann entropy of the distribution:
\begin{gather*}
    
\end{gather*}


## Incomplete...

Where might we gain an advantage?  Perhaps if the prior is not uniform: weak
measurements would allow one to check several different directions, whereas a strong
measurement would collapse the system.  To consider this, we use the same system where
the actual underlying state is $\ket{0}$, but now suppose we have two independent
particles that we can measure.  To be precise:

1. We have an underlying system of particles in the state $\ket{0}$, but with a flat
   prior.
2. We consider making different measurements with the detector at angle
   $\vartheta_{n}$, recording the results as $m_{n}$ or $s_n = 1-2m_n$.  Using
   rotational invariance, we can deduce that the likelihood of measuring $s$ with the
   detector at angle $\vartheta$ and the state at angle $\theta$ is:
   \begin{gather*}
     p_{\alpha,\vartheta}(s|\theta) = p_{\alpha}(s|\theta-\vartheta) 
     = \frac{1}{2}\left(
       1 + s\frac{1-\alpha^2}{1+\alpha^2}\cos(\theta-\vartheta)
     \right)
   \end{gather*}


The posterior after making the $n\in\{1, 2, \cdots\}$ strong measurements with results
$\vect{s} = (s_1, s_2, \dots, s_n)$ is:
\begin{gather*}
  p(\theta|\vect{s}) \propto \prod_{n}p_{\vartheta_n}(s_n|\theta)p(\theta).
\end{gather*}

Consider now a measurement of the second particle in the
direction of $\theta^{m}_2$.  The posterior now is:

```{code-cell} ipython3
:tags: [hide-input]

from scipy.special import xlogy

_TINY = np.finfo(float).tiny

def likelihood(theta, vartheta, epsilon, m):
    """Return the likelihood of measuring m, given a state |theta>"""
    if m == -1:
        epsilon *= -1
    return (1 + epsilon*np.cos(theta-vartheta))/2

def measure(theta, vartheta, epsilon, m):
    """Perform the specified measurement and return the new theta."""
    alpha = np.sqrt((1-epsilon)/(1+epsilon))
    return vartheta + 2*np.arctan(alpha ** m * np.tan((theta-vartheta)/2))

def get_posterior(theta, epsilon, ms, varthetas):
    thetas = theta
    post = 0*thetas + 1/np.pi
    for m, vartheta in zip(ms, varthetas):
        kw = dict(theta=thetas, vartheta=vartheta, epsilon=epsilon, m=m)
        post = post * likelihood(**kw)
        thetas = measure(**kw)
    return post

rng = np.random.default_rng(seed=4)

def doit(theta, epsilon, N=10):
    """Do a series of measurements."""
    ms = []
    varthetas = []
    vartheta = 0
    for n in range(N):
        p0 = likelihood(theta, vartheta=vartheta, epsilon=epsilon, m=1)
        m = 1 if rng.random() < p0 else -1
        varthetas.append(vartheta)
        ms.append(m)
        if m == -1:
            vartheta = np.pi/2 - vartheta
    return ms, varthetas
    
def H(p, p0, theta=thetas):
    """Return the relative entropy H(p‖p_0)."""
    return np.trapz(xlogy(p, p/(p0+_TINY)), theta)
    
N = 200
thetas = np.linspace(-np.pi, np.pi, 2*N+1)[1:-1:2]

# Uninformed prior for comparison.
p0 = 0*thetas + 1/2/np.pi
ps = [p0]
theta = thetas

epsilon = 0.2
alpha = np.sqrt((1-epsilon)/(1+epsilon))

P0 = get_posterior(thetas, epsilon=1, ms=[1], varthetas=[0])
P1 = get_posterior(thetas, epsilon=1, ms=[-1], varthetas=[0])

ms, varthetas = doit(theta=np.pi/2, epsilon=epsilon, N=100)
P = get_posterior(thetas, epsilon=epsilon, ms=ms, varthetas=varthetas)

sigma = 1/4
p0 = (np.exp(-thetas**2/2/sigma**2) 
      + np.exp(-(thetas-np.pi/2)**2/2/sigma**2)
      + np.exp(-(thetas-np.pi)**2/2/sigma**2)
      + np.exp(-(thetas+np.pi)**2/2/sigma**2)
     ) + 0.1
p0 /= np.trapz(p0, thetas)

P0 *= p0
P1 *= p0
P *= p0
P0 /= np.trapz(P0, thetas)
P1 /= np.trapz(P1, thetas)
P /= np.trapz(P, thetas)
plt.plot(thetas, p0, ':', label=f"$H(p‖p_0)={H(p0, p0):.2f}$")
plt.plot(thetas, P0, label=f"$H(p‖p_0)={H(P0, p0):.2f}$")
plt.plot(thetas, P1, label=f"$H(p‖p_0)={H(P1, p0):.2f}$")
plt.plot(thetas, P, '--', label=f"$H(p‖p_0)={H(P, p0):.2f}$")
plt.legend();
```

```{code-cell} ipython3
vartheta = np.pi/2
theta = np.pi/3
for n in range(10):
    print(theta)
    theta = measure(theta, vartheta=vartheta, epsilon=epsilon, m=1)
```

```{code-cell} ipython3
:tags: [hide-input]

'''
for n in range(9):
    ps.append(posterior(prior=ps[-1], epsilon=epsilon, theta=theta, m=0))
    theta = 2 * np.arctan(alpha * np.tan(theta/2))


fig, ax = plt.subplots()
for n, p in enumerate(ps):
    ax.plot(thetas, p, ':', label=f"$n={n}$, $H(p‖p_0)={H(p, p0):.2f}$")

p1 = posterior(ps[0], epsilon=1, theta=thetas, m=0)
ax.plot(thetas, p1, '-k', label=fr"$\epsilon=1$, $H(p‖p_0)={H(p1, p0):.2f}$")
ax.set(xlabel=fr"$\theta$", 
      ylabel=r"Posterior $p(\theta)$ after $n$ measurements",
      title=fr"$\epsilon={epsilon}$")
ax.legend();
'''
```

[marginal probability]: <https://en.wikipedia.org/wiki/Marginal_distribution>
[likelihood]: <https://en.wikipedia.org/wiki/Likelihood_function>
[Bayes' theorem]: <https://en.wikipedia.org/wiki/Bayes%27_theorem>
[Bloch sphere]: <https://en.wikipedia.org/wiki/Bloch_sphere> 
[qubit]: <https://en.wikipedia.org/wiki/Qubit>
[spherical coordinates]: <https://en.wikipedia.org/wiki/Spherical_coordinate_system>
[PVM]: <https://en.wikipedia.org/wiki/Projection-valued_measure>
[weak measurement]: <https://en.wikipedia.org/wiki/Weak_measurement>
[Kraus operators]: <https://en.wikipedia.org/wiki/Quantum_operation#Kraus_operators>
[section 4.5.2]: <https://ntserver1.wsulibs.wsu.edu:2171/lib/wsu/reader.action?docID=647366&ppg=225>
[differential entropy]: <https://en.wikipedia.org/wiki/Differential_entropy>