Experimental Uncertainties (“Errors”)
Preface on Probability Distribution Functions
Binomial Distribution
Often useful when the outcome of a “trial” is a “yes” or a “no”. Example: what is the probability that when you flip a coin, it will come up “heads”? It is 50%, and given an infinite number of flips, or “trials”, that’s how often you should get it. Now ask “If I flip a coin 1000 times, what is the probability that I will 435 heads? The Binomial Distribution os the one that is applicable to such a trial.
Poisson Distribution
“An approximation to the binomial distribution when the
average number of successes is very much smaller than the possible number.” Bevington, Data Reduction and
Analysis for the Physical Sciences. Often
used when counting the number of events occurring in a given time interval. As
an example, suppose you wanted to record the rate of radioactive decay of a
large sample of 14C.
Your sample might contain 1014 atoms, half of which decays in 5730
years (its “half-life”), but you only count the decay rate over the course of 1
day or 1 week. The number you get per unit time will generally be distributed
according to the Poisson Distribution. Likewise, detecting neutrinos is really
difficult, since they interact with matter so infrequently. Although billions
of neutrinos from the sun pass through you event second, measuring how often
they “hit” something might be measured in 1 event per year. Poisson statistics
would describe the rate of detection of neutrinos per unit time.
Normal (“Gaussian”) Distribution
An approximation to the binomial distribution when the number of events becomes infinitely large, and the probability of success is also infinitely large (sort of a large-N Poisson distribution). This is the one we usually use, under the assumptions that these conditions are being met. This is what we will be using. For more background information, see below:
Poisson: http://www.capdm.com/demos/software/html/capdm/qm/poissondist/usage.html
Normal: http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html
Glossary: http://www.stats.gla.ac.uk/steps/glossary/probability_distributions.html
Comparisons: http://www.rfbarrow.btinternet.co.uk/htmasa2/Binomial1.htm
WARNING: “For a hundred years or so, mathematical statisticians have been in love with the fact that the probability distribution of a very large number of very small random deviations almost always converges to a normal distribution. This infatuation tended to focus away from the fact that, for real data, the normal distribution is often rather poorly realized, if it is realized at all.” Press, Teukolsky, Vetterling, and Flannery, Numerical Recipes.
2 RULES:
1. No measurement is perfect
2. Some measurements are less perfect than others
As mentioned by your textbook, there can be many courses of uncertainties in a measurement:
Measurement Precision: high reproducibility of a given measurement result
Measurement Accuracy: result is close to the true value
Note that a highly precise result may be systematically off the true value (i.e. have low accuracy)
Note that a highly accurate result may have a high degree of scatter (i.e. have low precision)
You book (Norton) has some very nice examples, so in this (rare) instance I will follow along with much of the discussion as it appears there.
In real life, measurements will have some “scatter” due to a variety of phenomena, often beyond the experimenter’s control.
|
|
Precision is defined as the degree to which measurements are repeatable. Of the two sets of measurements shown here, Set (b) has much less scatter than Set (a), and so Set (b) is said to be more precise that Set (a).
Based on a visual inspection, both seem to scatter around the “true” value, but if you were to base a conclusion on these 2 sets of measurements, Set (b) would be the one you would want to choose. |
|
|
By contrast, here we have 2 sets of data, both of which have similar precision, but one is closer to the “true” value, and is said to be more accurate.
In some areas of physics measurements, I have seen both the random scatter (like the standard deviation, which we will get to later) and an estimate of the possible systematic errors (if they are known) quoted. |
|
|
Suppose you were to measure the brightness of a star. Due to variations in the arrival rate of photons, a trial of 10 measurements might result in the values shown here. Based on what I seen here, it wold not be easy to determine a good mean value.
|
|
After 100 measurements, one begins to see a more uniform distribution, although some measurements far from the “middle” can be seen.
|
|
|
After 1000 measurements, we can see a smooth symmetric distribution beginning to emerge.
|
|
|
In principle, if we were to obtain a very large number of measurements over very small intervals in magnitude, our distribution would look like this, and be describable using a “Normal” (“Gaussian”) Distribution. |
The 2 most common parameters used to describe such a distribution are the mean and the standard deviation.
Let the measured values of observations be described by
,
where
(i.e.
is the value of the first measurement,
is the value of the second measurement,
and so on.
Mean: (Note, many people use
in place of
to denote the mean).
Deviation of measurement from
is simply
.
The standard deviation is obtained by summing up the squares of the deviations
(note that this eliminates a negative signs…) and taking the Average of that
sum:
Standard Deviation: .
This is often referred to as root-mean-squared statistics, or rms
for short. Note that (as your book does), one often corrects for the fact that
this population standard deviation often underestimates the actual uncertainty,
and that a similar quantity, the sample standard deviation
is used. (This is because we do not
actually calculate the true mean value, but just our own approximation of it).
Note that one often hears the term variance used. The variance is just
,
without the square-root.
|
|
For a real Gaussian distribution, 68% of the measurements will lie within 1 standard deviation of the mean. However, 2 standard deviations will include roughly 95% of all the measurements, while extending this to 3 standard deviations will usually insure that 99.7% of the measurements.
But looking at the glass as half empty rather than half full. This also means that 32% of the measurements fall outside 1 standard deviation, 5% outside 2 standard deviations, and 0.3% outside 3 standard deviations. |
Let us suppose we had a second superbly precise and accurate value for a quantity, and it fell 1 standard deviation form the mean. Could we say that it was truly different than that mean? According to what is shown above, possibly not, since random scatter will cause 32% of the measurements to fall that far away. What if it were 2 standard deviations away? Well, that’s better, since there is only a 5% (1 in 20) chance that this would happen by chance. If it were 3 standard deviations away, the likelihood that this is just chance is 0.3%, i.e. 3 chances in 1000. For this reason, most scientists will not claim a difference is really different unless it is by at least 3 standard deviations.
Once in a while, though, people will consider a 2.0-2.5 standard deviation as “suggestive” and worthy of further investigation, even though it is not truly “statistically significant”.
In reality, both things being compared will have their own standard deviations, and when comparing them we often compare their means, and the standard deviations of those means.
Some Basic “Rules of Thumb”
|
|
Because 68% is about equal to
Now suppose you were counting the number of photons coming
from a light source in a fixed interval of time, such as a star. As you
repeated this experiment, the values wojuld cluster around some mean value
of
In many situations we often like to quote a signal-to-noise
ratio, variously abbreviated or SNR or S/N.
This would simply be
We could also define the fractional uncertainty, which is just the reciprocal of the SNR: |
In the case of “small-n statistics” the standard deviation itself is not well-determined, and in principle there is an “uncertainty in the uncertainty”!
Before we leave this portion of the subject, it is useful to remember that actual distributions may deviate from a Gaussian one. There are 2 general ways that this occurs with large samples:
Skewness: the distribution is asymmetric about the mean. This is almost guaranteed to happen when dealing with small-n statistics, or in cases where the tail of the distribution is close to zero and “negative” values are “unphysical”. (The distribution may be more “Poissonian”..).
Kurtosis: The distribution is either flatter or more “pointy” than a true Gaussian.
Combining Measurements
|
|
First, let the arrival rate of photons from a star be governed by a Gaussian distribution.
Suppose you made 5 measurements of the flux from the star, took the mean of the 5 measurements, and then repeated this act 9 more times. What would you find?
1. For each set of 5 measurements, you could calculate a mean value.
2. These mean values would not be the same, but would themselves scatter about the “true” value of the flux.
Conclusion: there is an uncertainty associated with the determination of the mean itself.
This is often referred to as the uncertainty of the mean, the standard error of the mean, or the standard deviation of the mean. It is given by:
|
|
|
Taking more measurements should make your calculated mean be closer to the true value.
For example, the 10 mean values made from 5 measurements each has a much larger scatter (i.e. higher uncertainty) than a similar trial of 10 sets of 20 measurements each.
You can see that the scatter in the second case is about half that in the first case, because 4 times as many actual measurements went into determining the 10 mean values. |
For some “real” applications, using the mean is a poor way to determine the mot representative estimate of a ‘true” value, since real measurements often have “outlier” values that deviate significantly from a Normal Distribution. They will have undue influence on determining the mean. And since the standard deviation uses the square of the deviations, they will produce errors that are actually too big as well.
A more “robust” estimate of the “true” value is the median. The median is that value where half of the measurements have larger values, and half have smaller values. If the number of values being sampled is even, one uses the average of the 2 middle ones.
Suppose you took 5 flat field images, and this is what you saw in one small sector:

Look at the central pixel. Let us take the mean of the five values for this pixel, and then the median.
Mean = 2776.8 Median = 1004
Which one of these gives the most “representative” value for this pixel? The Median. It is not unusual for measurements to contain these “outliers” that simply do not fit a Gaussian distribution, so the use of Normal Gaussian statistics is simply inappropriate. This is the sort of effect that can occur when a cosmic ray hits a CCD while it is exposing. Because of this, most astronomers construct a Median frame for actual use.
If one needs to quote an uncertainty associated with the median, usually the mean absolute deviation is used:
where the
is the “absolute value”. Using the
median minimizes the
.
Combining Uncertainties
Suppose one had different sources of uncertainty related to
a measurement a random
and a systematic one
,
for example. These were independent of one another, the combined uncertainty would be:
.
They do not simply “add”, because sometimes when
leads to a measurement that is too
small, sometimes
will partially compensate by “trying” to
make the measured value too large. This can be generalized to any number of independent errors:
.
Sums, Differences, Products, Ratios, and Powers
What is the net uncertainty in the sum, difference, product, or ratio of 2 quantities, if each has an associated uncertainty?? How does one handle exponents?
Scaling: Simply multiplying a single number with an uncertainty
by a constant
:
If then
,
i.e. it scales directly. This can also be expressed by saying that
.
That is, the fractional error is unchanged!
Addition & Subtraction of 2 quantities with associated uncertainties:
.
In many cases, no scaling has been done, and
so that
.
Multiplication and Division: If you guessed that this would be similar to scaling which, after all, is just multiplying by a quantity with no associated uncertainty, you would be right!!
,
i.e.
.
Powers:
If then
Logs Need this for Magnitudes:
To do this right, we need to realize that two magnitude
measurements made by counting photons (or equivalently measuring the flux).
Suppose we changed the flux by an amount
.
How would that affect the magnitude
?
If the uncertainty in the magnitude
were:
Dropping the minus sign (since we are only dealing size of the uncertainty):
.
Now, if we were to let represent the error in
,
such as having it be
,
then
.
For if the measurement of is governed by root-n statistics, then:
.
For Small Uncertainties in Flux:
Now, when the uncertainties get above 10% or so, they can
actually become “asymmetric”! One can see this in the following way. We could
just as equally defined using
instead of
.
In that case we would have gotten
instead of
where I have kept the minus sign to be
completely correct. Suppose that
.
Then in the first case,
,
whereas in the second case,
!!
Because of this, one might see a magnitude written down as
.
This sort of thing is a common occurrence when you have “large” errors in
logarithmic quantities. Of course, errors of 0.3 mag mean that the measurement
is barely a
one.
Graphing
I won’t saw too much here except to re-iterate these 2 points:
1. Choose a graph that is appropriate for the data you have
2. When “fitting” data with uncertainties in an XY graph, your curve should “hit” about 2/3 of the error bars.
Graph Types
Example: when graphing something that is a power law, like ,
don’t graph
versus
!
The eye-brain combination does not do a good job fitting complex non-linear
curves. Do one of the following:
Take the Log of both quantities: .
You have now transformed a power-law shape into a linear one!!
Plot versus
on Log-Log paper (or generate the same
on a computer). This accomplishes the same thing. You see a straight line,
which is easier to fit by eye.
Fitting
Taking an example from your book:
|
|
These two “fits” suggest that there is something “wrong” with the data. In the left-hand case, the curve hits much fewer than 2/3 of the error bars, suggesting that the uncertainties have been grossly underestimated. |
|
|
In the right hand case the opposite is true. The curve hits more than 2/3 of the error bats (in fact, it hits 100% of them). This suggests that the uncertainties are probably over-estimated. |
Of course there are more precise ways of doing fitting, than “by eye”, but I won’t go into least-squares and other things in this course. But let us look at a practical example of understanding the errors based on the fitting discussion above.
Spitzer Spectroscopy of Comet Schwassmann-Wachmann 3
Observing comets with the Spitzer Space Telescope has some challenges. Here is an example of how it is done.
Spectra are obtained by placing the comet on the slit of the
spectrograph, and obtaining an exposure. One slit sends the light to one
grating that provides a spectrum over some interval in .
By moving the comet onto the other slit, the light goes to a different grating,
and a different wavelength interval is covered.

In the example shown here (from Kelley et al. 2006, ApJ, 651, 1256) the comet is placed on the slit for the low spectral resolution n=2 order of the Infrared Spectrograph (IRS) in the upper panel (5.2-8.7 μm = “SL2”), and in n=1 order in the lower panel (7.4-14.5 μm = “SL1”) in the lower panel.
Often, we also nod the telescope (move the target to the other side of the middle of the slit) for each exposure, so that we get 2 exposures in the same slit, one right after the other.
We observed 2 pieces of the “crumbling” comet SW3 Components B and C. When we combined 2
nods that were supposed to be the same,
there is what we got for SL2 and SL1:
|
|
We are not trying to “fit” anything here, but the line is the mean spectrum based on the 2 nods, and the error bars are the standard deviations. Note that the point-to-point scatter is way smaller than the error bars show, suggesting that something is amiss here. |
As it turns out, the nods were probably not well-done, so that less light passed through the slit on one nod than on the other, causing its spectrum to be systematically lower. Both spectra might have tiny random errors, but there was a larger systematic error that had crept in!
Things were even worse at long wavelengths!!
|
|
Here are the spectra of the 2 individual nods in the long wavelength arrays (LL2 and LL1). You can see that there is a 30% difference!!
Just blindly using these 2 spectra to calculate a mean spectrum and standard deviation at each data point would have produced a result even worse than that of the short-wavelength spectrum. And the problem was much worse for SW3-B than for SW3-C.
The reason for this huge difference is probably not due to poor nods, but in the way that the spectrograph is oriented with respect to the comet. |
Here is what the orientation of the comets were like on the sky.
|
|
|
|
SW3-B |
SW3-C |
When placed on the slit and nodded:
|
|
|
|
|
|
|
SW3-B |
SW3-C |
Doing a “cut” along the comet’s tail for SW3-C:

Extracting the Spectrum:

Comparing the “cuts” along the direction of the comets’ tails:
|
|
|
|
Profile for one nod on SW3-B |
Profile for one nod on SW3-C |
The Nodding caused part of the tail of B to fall outside the slit more on one nod than another. The effect was smaller for C.
Two nods on C in LL2:
|
|
|
|
2 Nods on C in LL1:
|
|
|
|