## Propagation of Error

My goal here is to understand the error propagation formulas that many scientists look up in some reference book every once in a while when they have to do the data analysis for their next paper.

Suppose you are conducting two different measurements, the results of which you label $t_1$ and $t_2$. We’ll let the variables take any value, so

$t_1, t_2 \in (-\infty , \infty)$,

and each of those results has a different chance of occurring. Write

$P(a < t_1 < b) = \int_a^b P_1(x)dx$

to define $P_1(x)$, the probability density for the variable $t_1$. Then do the same thing to define $P_2(x)$, the probability density for $t_2$.

The only requirements on $P_1$ and $P_2$ are that

$\int_{-\infty}^{\infty} P_1(x) dx = \int_{-\infty}^{\infty} P_2(x) = 1$

and

$P_1(x) \geq 0 \ \forall x$
$P_2(x) \geq 0 \ \forall x$

I allowed the measurements to be any number from minus infinity to plus infinity, but if you have something that’s restricted to some finite range, that’s no problem. Just set

$P_1(x) = 0 \ \forall x$ not in $range(t_1)$.

On the other hand, it’s not okay if your measurement is discrete (how many jelly beans are in the jar?), unless you’re willing to do some mathematical tricks (step function distributions, for example).

You aren’t happy with just $t_1$ and $t_2$. Instead, you want to combine them in some way to find

$t_3 = f(t_1,t_2)$,

(add them, multiply them, find $\left(\frac{t_1^n + t_2^n}{2}\right)^\frac{1}{n}$, etc.) What is the probability distribution for this new statistic?

The new statistic is a number, so it also ranges from $-\infty$ to $\infty$. (It may have zero probability over part of that range). What is the probability that $t_3 = \alpha$? Technically, it’s zero (or infinitesimal, if you like), but what is the probability density, $P_3(\alpha)$?

There may be many ways to get $t_3 = \alpha$. For example, if

$t_3 = t_1 + t_2$, then

$t_1 = \alpha + \Delta$
$t_2 = -\Delta$

does the trick for all $\Delta$, meaning there are infinitely many solutions. The chance of getting $t_3 = \alpha$ comes from adding up all the various possibilities thusly:

$P_3(\alpha) = \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}P_1(t_1)P_2(t_2) \delta(\alpha - f(t_1,t_2))dt_1 dt_2$

or, if you don’t like the dirac delta function, think of it as a path integral

$P_3(\alpha) = \int_{f(t_1,t_2) = \alpha}P_1(t_1)P_2(t_2)ds$

with $ds$ an element of length along a curve of constant $f$ in its two-dimensional domain, so that

$ds^2 = dt_1^2 + dt_2^2$

That gives the full probability distribution for $t_3$. Scientists generally report their data in terms of a mean and standard deviation, so you would next extract those statistics from their definition and the distribution $P_3(\alpha)$.

Let’s work an example by adding two Gaussians. Suppose we have

$P_1(x) = \frac{1}{\sigma_1 \sqrt{2\pi}}e^{\frac{(x-\mu_1)^2}{2\sigma_1^2}}$

$P_2(x) = \frac{1}{\sigma_2 \sqrt{2\pi}}e^{\frac{(x-\mu_2)^2}{2\sigma_2^2}}$

where the $\mu$‘s are the means and the $\sigma$‘s are the standard deviations. Define

$t_3 = t_1 + t_2$.

What is the chance that, for example, $t_3 = 5$? We could have
$\begin{array}{rcccccl} t_1 & = & 2 & , & t_2 & = & 3 \\ t_1 & = & -7 & , & t_2 & = & 12 \\ t_1 & = & 3.92 & , & t_2 & = & 1.08 \\ t_1 & = & x & , & t_2 & = & 5-x \end{array}$

They all contribute some probability. To find the total probability density, integrate over all possibilities, remembering that (assuming the measurements are independent)

$P(A \bigcap B) = P(A)P(B)$.

Also, instead of looking at the special case $t_3 = 5$, we’ll keep things general and let $t_3 = \alpha$.

$\begin{array}{rcl}P_3(\alpha) & = & \int_{-\infty}^{\infty}P_1(x)P_2(\alpha-x)dx \\ { } & = & \frac{1}{2\pi\sigma_1\sigma_2}\int_{-\infty}^{\infty}e^{\frac{-(x-\mu_1)^2}{2\sigma_1^2}}e^{\frac{-((\alpha-x)-\mu_2)^2}{2\sigma_2^2}} \end{array}$

Before continuing, note the identity

$ax^2 + bx + c = \left( \sqrt{a} x + \frac{b}{2 \sqrt{a}} \right)^2 + c - \frac{b^2}{4a}$

meaning that any quadratic differs from a perfect square by only a constant. This is good because it means the exponential of any quadratic is a (not necessarily normalized) Gaussian.

$\begin{array}{rc} {} & e^{ax^2+bx+c} \\ = & e^{\left(\sqrt{a}x + \frac{b}{2\sqrt{a}}\right)^2 + c - \frac{b^2}{4a}} \\ = & C*e^{\left(\sqrt{a}x + \frac{b}{2\sqrt{a}}\right)^2} \end{array}$

with $C = e^{c - \frac{b^2}{4a}}$

Take the equation for $P_3(\alpha)$, combine the exponents of the two exponentials using $e^a e^b = e^{a+b}$, and we’ll have the exponential of a quadratic as the integrand. All we have to do is choose the appropriate values of $a, b, c$. $C$ is a constant when integrating over $x$, so it goes on the outside of the integral (note that $C$ does have $\alpha$ dependence). What’s left inside the integral is a new Gaussian, and since we’re integrating it from $-\infty$ to $\infty$ it turns into a number that can be found quite easily.

At last you can just feel all that remaining pesky algebra stuff crunchy crunch crunching under the implacable fury of your pencil, and what comes out is simply

$P_3(\alpha) = \frac{1}{\sigma_3^2\sqrt{2\pi}}e^{\frac{(\alpha - \mu_3)^2}{2\sigma_3^2}}$

with

$\sigma_3^2 = \sigma_1^2 + \sigma_2^2$

$\mu_3 = \mu_1 + \mu_2$

So when you add two Gaussians, you get a new Gaussian, and the means and variances add. Isn’t that convenient?