Experimental Uncertainties (“Errors”)

Preface on Probability Distribution Functions

 

Binomial Distribution

 

Often useful when the outcome of a “trial” is a “yes” or a “no”. Example: what is the probability that when you flip a coin, it will come up “heads”? It is 50%, and given an infinite number of flips, or “trials”, that’s how often you should get it. Now ask “If I flip a coin 1000 times, what is the probability that I will 435 heads? The Binomial Distribution os the one that is applicable to such a trial.

 

Poisson Distribution

 

“An approximation to the binomial distribution when the average number of successes is very much smaller than the possible number.”  Bevington, Data Reduction and Analysis for the Physical Sciences. Often used when counting the number of events occurring in a given time interval. As an example, suppose you wanted to record the rate of radioactive decay of a large sample of 14C. Your sample might contain 1014 atoms, half of which decays in 5730 years (its “half-life”), but you only count the decay rate over the course of 1 day or 1 week. The number you get per unit time will generally be distributed according to the Poisson Distribution. Likewise, detecting neutrinos is really difficult, since they interact with matter so infrequently. Although billions of neutrinos from the sun pass through you event second, measuring how often they “hit” something might be measured in 1 event per year. Poisson statistics would describe the rate of detection of neutrinos per unit time.

 

Normal (“Gaussian”) Distribution

 

An approximation to the binomial distribution when the number of events becomes infinitely large, and the probability of success is also infinitely large (sort of a large-N Poisson distribution). This is the one we usually use, under the assumptions that these conditions are being met. This is what we will be using. For more background information, see below:

 

Poisson: http://www.capdm.com/demos/software/html/capdm/qm/poissondist/usage.html

 

Normal: http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

 

Glossary: http://www.stats.gla.ac.uk/steps/glossary/probability_distributions.html

 

Comparisons: http://www.rfbarrow.btinternet.co.uk/htmasa2/Binomial1.htm

 

WARNING: “For a hundred years or so, mathematical statisticians have been in love with the fact that the probability distribution of a very large number of very small random deviations almost always converges to a normal distribution. This infatuation tended to focus away from the fact that, for real data, the normal distribution is often rather poorly realized, if it is realized at all.” Press, Teukolsky, Vetterling, and Flannery, Numerical Recipes.

 

2 RULES:

 

1. No measurement is perfect

2. Some measurements are less perfect than others

 

 

As mentioned by your textbook, there can be many courses of uncertainties in a measurement:

 

  1. Lack of skill in measuring (the “personal equation”)

 

  1. Instrumental limitations (instrument noise or systematic error)

 

  1. Extraneous influences (usually lack of control of the “experiment”, which may or may not be under the experimenter’s control)

 

  1. Real variations in the quantity being measured (a source with real fluctuations will produce “scatter” in the data)

 

  1. Random fluctuations (real noise; with light, uneven arrival rate of photons)

 

 

Measurement Precision: high reproducibility of a given measurement result

 

Measurement Accuracy: result is close to the true value

 

Note that a highly precise result may be systematically off the true value (i.e. have low accuracy)

 

Note that a highly accurate result may have a high degree of scatter (i.e. have low precision)

 

 

 


You book (Norton) has some very nice examples, so in this (rare) instance I will follow along with much of the discussion as it appears there.

 

In real life, measurements will have some “scatter” due to a variety of phenomena, often beyond the experimenter’s control.

 

 

Precision is defined as the degree to which measurements are repeatable. Of the two sets of measurements shown here, Set (b) has much less scatter than Set (a), and so Set (b) is said to be more precise that Set (a).

 

Based on a visual inspection, both seem to scatter around the “true” value, but if you were to base a conclusion on these 2 sets of measurements, Set (b) would be the one you would want to choose.

 

 

By contrast, here we have 2 sets of data, both of which have similar precision, but one is closer to the “true” value, and is said to be more accurate.

 

In some areas of physics measurements, I have seen both the random scatter (like the standard deviation, which we will get to later) and an estimate of the possible systematic errors (if they are known) quoted.

 

 

 

 

Suppose you were to measure the brightness of a star. Due to variations in the arrival rate of photons, a trial of 10 measurements might result in the values shown here. Based on what I seen here, it wold not be easy to determine a good mean value.

 

 

 

After 100 measurements, one begins to see a more uniform distribution, although some measurements far from the “middle” can be seen.

 

 

 

After 1000 measurements, we can see a smooth symmetric distribution beginning to emerge.

 

 

 

In principle, if we were to obtain a very large number of measurements over very small intervals in magnitude, our distribution would look like this, and be describable using a “Normal” (“Gaussian”) Distribution.

 

 

 

The 2 most common parameters used to describe such a distribution are the mean and the standard deviation.

 

Let the measured values of  observations be described by , where  (i.e.  is the value of the first measurement,  is the value of the second measurement, and so on.

Mean:   (Note, many people use  in place of  to denote the mean).

 

Deviation of measurement  from  is simply . The standard deviation is obtained by summing up the squares of the deviations (note that this eliminates a negative signs…) and taking the Average of that sum:

 

Standard Deviation: . This is often referred to as root-mean-squared statistics, or rms for short. Note that (as your book does), one often corrects for the fact that this population standard deviation often underestimates the actual uncertainty, and that a similar quantity, the sample standard deviation  is used. (This is because we do not actually calculate the true mean value, but just our own approximation of it). Note that one often hears the term variance used. The variance is just , without the square-root.

 

 

 

For a real Gaussian distribution, 68% of the measurements will lie within 1 standard deviation of the mean. However, 2 standard deviations will include roughly 95% of all the measurements, while extending this to 3 standard deviations will usually insure that 99.7% of the measurements.

 

But looking at the glass as half empty rather than half full. This also means that 32% of the measurements fall outside 1 standard deviation, 5% outside 2 standard deviations, and 0.3% outside 3 standard deviations.

 

 

Let us suppose we had a second superbly precise and accurate value for a quantity, and it fell 1 standard deviation form the mean. Could we say that it was truly different than that mean? According to what is shown above, possibly not, since random scatter will cause 32% of the measurements to fall that far away. What if it were 2 standard deviations away? Well, that’s better, since there is only a 5% (1 in 20) chance that this would happen by chance. If it were 3 standard deviations away, the likelihood that this is just chance is 0.3%, i.e. 3 chances in 1000. For this reason, most scientists will not claim a difference is really different unless it is by at least 3 standard deviations.

Once in a while, though, people will consider a 2.0-2.5 standard deviation as “suggestive” and worthy of further investigation, even though it is not truly “statistically significant”.

 

In reality, both things being compared will have their own standard deviations, and when comparing them we often compare their means, and the standard deviations of those means.


 

Some Basic “Rules of Thumb”

 

Because 68% is about equal to , rough this fraction of all your measurements will be within 1 standard deviation of the mean.

 

Now suppose you were counting the number of photons coming from a light source in a fixed interval of time, such as a star. As you repeated this experiment, the values wojuld cluster around some mean value of  . For large  the standard deviation approaches . This is the origin of the phrase root-n statistics.

 

In many situations we often like to quote a signal-to-noise ratio, variously abbreviated or SNR or S/N. This would simply be . Thus if we counted 10,000 photons, SNR=100, but if we only counted 100 photons, SNR=10.

 

We could also define the fractional uncertainty, which is just the reciprocal of the SNR:  

In the case of “small-n statistics” the standard deviation itself is not well-determined, and in principle there is an “uncertainty in the uncertainty”!

 

Before we leave this portion of the subject, it is useful to remember that actual distributions may deviate from a Gaussian one. There are 2 general ways that this occurs with large samples:

 

Skewness: the distribution is asymmetric about the mean. This is almost guaranteed to happen when dealing with small-n statistics, or in cases where the tail of the distribution is close to zero and “negative” values are “unphysical”. (The distribution may be more “Poissonian”..).

 

Kurtosis: The distribution is either flatter or more “pointy” than a true Gaussian.


Combining Measurements

 

 

 

 

 

First, let the arrival rate of photons from a star be governed by a Gaussian distribution.

 

Suppose you made 5 measurements of the flux from the star, took the mean of the 5 measurements, and then repeated this act 9 more times. What would you find?

 

1. For each set of 5 measurements, you could calculate a mean value.

 

2. These mean values would not be the same, but would themselves scatter about the “true” value of the flux.

 

Conclusion: there is an uncertainty associated with the determination of the mean itself.

 

This is often referred to as the uncertainty of the mean, the standard error of the mean, or the standard deviation of the mean.  It is given by:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Taking more measurements should make your calculated mean be closer to the true value.

 

For example, the 10 mean values made from 5 measurements each has a much larger scatter (i.e. higher uncertainty) than a similar trial of 10 sets of 20 measurements each.

 

You can see that the scatter in the second case is about half that in the first case, because 4 times as many actual measurements went into determining the 10 mean values.

 


For some “real” applications, using the mean is a poor way to determine the mot representative estimate of a ‘true” value, since real measurements often have “outlier” values that deviate significantly from a Normal Distribution. They will have undue influence on determining the mean.  And since the standard deviation uses the square of the deviations, they will produce errors that are actually too big as well.

 

A more “robust” estimate of the “true” value is the median. The median is that value where half of the measurements have larger values, and half have smaller values. If the number of values being sampled is even, one uses the average of the 2 middle ones.

 

Suppose you took 5 flat field images, and this is what you saw in one small sector:

 

 

 

 

 

Look at the central pixel. Let us take the mean of the five values for this pixel, and then the median.

 

Mean = 2776.8   Median = 1004

 

Which one of these gives the most “representative” value for this pixel? The Median. It is not unusual for measurements to contain these “outliers” that simply do not fit a Gaussian distribution, so the use of Normal Gaussian statistics is simply inappropriate. This is the sort of effect that can occur when a cosmic ray hits a CCD while it is exposing. Because of this, most astronomers construct a Median frame for actual use.

 

If one needs to quote an uncertainty associated with the median, usually the mean absolute deviation is used:

 

 where the  is the “absolute value”. Using the median minimizes the .

 

Combining Uncertainties

 

Suppose one had different sources of uncertainty related to a measurement  a random  and a systematic one , for example. These were independent of one another, the combined uncertainty would be: . They do not simply “add”, because sometimes when  leads to a measurement that is too small, sometimes  will partially compensate by “trying” to make the measured value too large. This can be generalized to any number of independent errors: .

 

Sums, Differences, Products, Ratios, and Powers

 

What is the net uncertainty in the sum, difference, product, or ratio of 2 quantities, if each has an associated uncertainty?? How does one handle exponents?

 

Scaling: Simply multiplying a single number  with an uncertainty  by a constant :

 

If  then , i.e. it scales directly. This can also be expressed by saying that . That is, the fractional error is unchanged!

           

Addition & Subtraction of 2 quantities with associated uncertainties:

 

. In many cases, no scaling has been done, and  so that .

 

Multiplication and Division: If you guessed that this would be similar to scaling which, after all, is just multiplying by a quantity with no associated uncertainty, you would be right!!

 

, i.e. .

Powers:

If  then  

 

Logs  Need this for Magnitudes:

 

To do this right, we need to realize that two magnitude measurements made by counting photons (or equivalently measuring the flux). Suppose we changed the flux  by an amount . How would that affect the magnitude ?

 

 If the uncertainty in the magnitude were:

 

 

Dropping the minus sign (since we are only dealing size of the uncertainty):

 

.

 

Now, if we were to let  represent the error in , such as having it be , then

.

 

For if the measurement of  is governed by root-n statistics, then:

.

 

For Small Uncertainties in Flux:

 

 

Now, when the uncertainties get above 10% or so, they can actually become “asymmetric”! One can see this in the following way. We could just as equally defined  using  instead of . In that case we would have gotten

 instead of  where I have kept the minus sign to be completely correct. Suppose that . Then in the first case, , whereas in the second case, !! Because of this, one might see a magnitude written down as . This sort of thing is a common occurrence when you have “large” errors in logarithmic quantities. Of course, errors of 0.3 mag mean that the measurement is barely a  one.

 

Graphing

 

I won’t saw too much here except to re-iterate these 2 points:

 

1. Choose a graph that is appropriate for the data you have

 

2. When “fitting” data with uncertainties in an XY graph, your curve should “hit” about 2/3 of the error bars.

 

 

Graph Types

 

Example: when graphing something that is a power law, like , don’t graph  versus ! The eye-brain combination does not do a good job fitting complex non-linear curves. Do one of the following:

 

Take the Log of both quantities: . You have now transformed a power-law shape into a linear one!!

 

Plot  versus  on Log-Log paper (or generate the same on a computer). This accomplishes the same thing. You see a straight line, which is easier to fit by eye.

 

Fitting

 

Taking an example from your book:

 

 

 

These two “fits” suggest that there is something “wrong” with the data. In the left-hand case, the curve hits much fewer than 2/3 of the error bars, suggesting that the uncertainties have been grossly underestimated.

 

 

In the right hand case the opposite is true. The curve hits more than 2/3 of the error bats (in fact, it hits 100% of them). This suggests that the uncertainties are probably over-estimated.

 

Of course there are more precise ways of doing fitting, than “by eye”, but I won’t go into least-squares and other things in this course. But let us look at a practical example of understanding the errors based on the fitting discussion above.

Spitzer Spectroscopy of Comet Schwassmann-Wachmann 3

 

Observing comets with the Spitzer Space Telescope has some challenges. Here is an example of how it is done.

 

Spectra are obtained by placing the comet on the slit of the spectrograph, and obtaining an exposure. One slit sends the light to one grating that provides a spectrum over some interval in . By moving the comet onto the other slit, the light goes to a different grating, and a different wavelength interval is covered.

 

 

In the example shown here (from Kelley et al. 2006, ApJ, 651, 1256) the comet is placed on the slit for the low spectral resolution n=2 order of the Infrared Spectrograph (IRS) in the upper panel (5.2-8.7 μm = “SL2”), and in n=1 order in the lower panel (7.4-14.5 μm = “SL1”) in the lower panel.

Often, we also nod the telescope (move the target to the other side of the middle of the slit) for each exposure, so that we get 2 exposures in the same slit, one right after the other.

 

We observed 2 pieces of the “crumbling” comet SW3  Components B and C. When we combined 2 nods that were supposed to be the same, there is what we got for SL2 and SL1:

 

 

 

 

 

 

 

 

We are not trying to “fit” anything here, but the line is the mean spectrum based on the 2 nods, and the error bars are the standard deviations. Note that the point-to-point scatter is way smaller than the error bars show, suggesting that something is amiss here.

 

 As it turns out, the nods were probably not well-done, so that less light passed through the slit on one nod than on the other, causing its spectrum to be systematically lower. Both spectra might have tiny random errors, but there was a larger systematic error that had crept in!

 

Things were even worse at long wavelengths!!

 

 

 

 

Here are the spectra of the 2 individual nods in the long wavelength arrays (LL2 and LL1). You can see that there is a 30% difference!!

 

Just blindly using these 2 spectra to calculate a mean spectrum and standard deviation at each data point would have produced a result even worse than that of the short-wavelength spectrum. And the problem was much worse for SW3-B than for SW3-C.

 

The reason for this huge difference is probably not due to poor nods, but in the way that the spectrograph is oriented with respect to the comet.

 

Here is what the orientation of the comets were like on the sky.

 

 

SW3-B

SW3-C

 

When placed on the slit and nodded:

 

 

SW3-B

SW3-C

Doing a “cut” along the comet’s tail for SW3-C:

 

 

 

 

 

 

Extracting the Spectrum:

 

 

 

Comparing the “cuts” along the direction of the comets’ tails:

 

Profile for one nod on SW3-B

Profile for one nod on SW3-C

 

The Nodding caused part of the tail of B to fall outside the slit more on one nod than another. The effect was smaller for C.

 


Two nods on C in LL2:

 

2 Nods on C in LL1: