An Overview of Error Analysis

When you report a measurement of something, you must also report the error in your measurement.  For example, say you measured the surface gravity of the earth, at your location, to be 9.816 meters per second squared.  Does that mean that you would get exactly the same value if you reproduced the experiment?   Does it mean that it could be 9.817 meters per second squared?   Does it mean that it could be 9.836 meters per second squared?  As you see, this is completely unclear.   On the other hand, had you reported a value of, say, 9.816±0.005 meters per second squared, your reader would know that there is about a 2/3 chance of the real value being 9.811 and 9.821 meters per second squared.

In order to be able to report an error in your quantity of interest, you must also know the errors in each of your measurements.  For example, say you had dropped an object and measured the time it took to fall a measured distance, then you would need to know the error in both the distance and the time to estimate the error.  For example, if you dropped your object from a set height of 2 meters, you would need to report how well you knew this.  For example, you may report something like:  “The ball took 0.65±0.02 seconds to fall from a height of 2.000±0.002 meters.”   That way the reader not only knows what you measured but also how well you measured it.

There are two ways to estimate your error in a measurement, and you should make a habit of always doing the first, and doing the second when at all possible.  These are:  (1) common sense at the time of the measurement, and (2) statistically by taking multiple independent measurements.  In the second method, the error in each measurement can be estimated by finding the standard deviation of the measurements.  If you estimated your errors correctly, these should be similar.

For example, say you dropped the ball from two meters above the floor with a ball dropper, you may notice that it could be off by a couple of millimeters, either way, so you report two millimeters as your error.  That would be an example of method (1).  On the other hand, say you did this 5 times and measured the fall times to be:  0.59, 0.61, 0.66, 0.70, and 0.68 seconds. Then you would surmise that the error in each measurement was about 0.05 seconds since the standard deviation is 0.0466 seconds.

So, how long did it take for the ball to fall the two meters?   If each of the errors were random, with no mistakes, then you would want to average them.  If, on the other hand, you had reason to believe one was a mistake, you would throw out that datum.  For example, say the measurements were instead:  0.59, 0.95, 0.66, 0.70, and 0.68 seconds, you could assume that you made a mistake taking the 0.95 datum, as it 0.36 seconds greater than the average and the standard deviation with it was only 0.22.  That said, you must be careful, as you would not want to bias your measurements.

Now, if your error were random, and the error in each one is known, what is the error in the average?  Clearly it is less than it would have been had you only taken one measurement.  But how much so?  As it turns out the error in a sum is the square root of the sum of the errors, or in math notation:  {{\sigma }_{sum}}=\sqrt{\sum\limits_{i=1}^{N}{\sigma _{1}^{2}}} .  If all of the individual errors are the same, then this simply becomes:  {{\sigma }_{sum}}=\sqrt{N{{\sigma }^{2}}}=\sigma \sqrt{N} .   Since the average is simply the sum divided by the number of samples, then the error in the sum is given by:
{{\sigma }_{avg}}=\frac{{{\sigma }_{sum}}}{N}=\frac{\sigma }{\sqrt{N}} .

Thus, in our example, you would report that it took 0.65±0.02 seconds for the ball to fall a distance of 2.000±0.002 meters.  However, you would keep all the insignificant digits on your spreadsheet or other data analysis software.

Now that you know this, what is your value, and error in, g?    Finding the best value is simple, we simply use the equation:  g=\frac{2h}{{{t}^{2}}}.   Plugging in the best values gives us 9.53 meters per second squared for the surface gravity.   But, how well do we know this number?

In order to calculate the error in g, you must test how sensitive your equation is to changes in our measured values.  This is called propagation of errors,  and you can do it by simply calculating what your quantity of interest (g) would be if each of your independent variables were at its maximum and minimum expected values.  Once you have done this for each of independent variables, holding the others constant, you must add up the errors in quadrature using the Pythagorean theorem.

In our example, we begin by calculating the error in g, only because of the error in the height.  Keeping the time fixed at its best value, 0.648 s, and calculating g with h=1.998 m and 2.002 m, we get corresponding surface gravities of 9.52 and 9.54 meters per second squared respectively.  Thus, the error in g, only due to the error in the height, is half the difference, or 0.01 meters per second squared.  But what about the error in the time?  Doing the same calculations, but keeping h=2.000 meters and allowing the time to be 0.627 and 0.669 seconds, gives us 9.52 and 9.54 meters per second squared respectively for g.  Thus, the error, only due to the error in the time, is 0.61 meters per second squared.   As this is much bigger than the other, it clearly dominates.**

To find the error due to both, we add them in quadrature, like the way we found the error in the sum above.  Thus, in this case: {{\sigma }_{g}}=\sqrt{\sum\limits_{i=1}^{N}{\sigma _{1}^{2}}}=\sqrt{{{\left( 0.61{\scriptstyle{}^{m}\!\!\diagup\!\!{}_{{{s}^{2}}}\;} \right)}^{2}}+{{\left( 0.01{\scriptstyle{}^{m}\!\!\diagup\!\!{}_{{{s}^{2}}}\;} \right)}^{2}}}=0.61{\scriptstyle{}^{m}\!\!\diagup\!\!{}_{{{s}^{2}}}\;} .

Finally, you would report the following as your conclusion:   “The gravitational field of the Earth, at my location, is 9.5±0.6 meters per second squared.”  Notice that we really only care about the error to one significant figure, and we round off the numbers accordingly so as not to distract the reader.

But, what is the REAL value of g?  Is it not 9.81 meters per second squared, or whatever my physics book says?   The answer is that you do not know, and you should not pretend that you do.  It is a good idea to compare your value of g to someone else’s measurement to see if they are consistent, but make sure you cite your source.  This is very important.  By the way, in physics, it is customary to cite the paper of whoever actually made the measurement, even if you actually found the value in a secondary source like an encyclopedia.

If you are interested in the best measurements of g to date, you can read this review article about how geologists measure it to about 10 digits of accuracy in order to model the internal mass distribution of the earth.

If you have studied differential calculus, it may be easier to use it to propagate the errors, however you need to understand that this is no more correct than the way described above.  So, if you have not studied calculus yet, there is no need to worry or to read further.

Recall that in order to calculate the error in what you want, you had to test how changes in each of your measured values (independent variables) affect the quantity of interest (dependent variable).  We advised you to accomplish this by numerically calculating the quantity of interest in the maximum and minimum of each error domain (holding the other independent variables constant), giving you a range of resulting values.

For relatively small error bars, you can assume that the average slope over your error range is simply the derivative of your function at the measured point.  Mathematically, let the independent variable g be written in function notation of two variables, g = g\left(h,t\right).  Therefore we can rewrite the error due only to the error in h as:

\sigma_{g_h} = \frac{1}{2} \left( g\left( h+\sigma_h,t \right) - g \left( h - \sigma_h,t \right) \right)  = \left( \frac{ g\left( h+\sigma_h,t \right) - g \left( h - \sigma_h,t \right)}{2 \sigma_h}  \right) \sigma_h  = \frac{\Delta g}{\Delta h} \sigma_h

This simplifies to: \sigma_{g_h}   \approx   \frac{\partial g}{\partial h} \sigma_h .

The symbol \partial is just like a d in calculus, but warns you that there is at least one other independent variable, which we are holding constant.  This is called a partial derivative.  Thus, in our example, the error due to both independent variables, is given by:
{{\sigma }_{g}}=\sqrt{{{\left( \frac{\partial g}{\partial h}{{\sigma }_{h}} \right)}^{2}}+{{\left( \frac{\partial g}{\partial t}{{\sigma }_{t}} \right)}^{2}}}=\sqrt{{{\left( \frac{2{{\sigma }_{h}}}{{{t}^{2}}} \right)}^{2}}+{{\left( -\frac{4h{{\sigma }_{t}}}{{{t}^{3}}} \right)}^{2}}}.

Now, factoring out a factor of g, we get:
{{\sigma }_{g}}=\sqrt{{{\left( \frac{{{\sigma }_{h}}}{h}g \right)}^{2}}+{{\left( -\frac{2{{\sigma }_{t}}}{t}g \right)}^{2}}}=g\sqrt{{{\left( \frac{{{\sigma }_{h}}}{h} \right)}^{2}}+{{\left( \frac{2{{\sigma }_{t}}}{t} \right)}^{2}}}.

Why is this easier?  It is easier because you can calculate this once, and then have a single formula.  Plus, it is very powerful.  For example, as an exercise, use this formula to calculate the error in the sum of a bunch of quantities, and compare that to the error in the sum that we have above.

Similarly, you can use this to calculate the error in the average of a bunch of data each with its own error bar.