Statistical significance: Are you sure that you're sure?

All scientific measurements are based on an incomplete access to information, such as this small sample of marbles from a larger pool.

Read any news article about the search for new physics and you'll quickly see the word "certainty." It describes the probability that an observation means a discovery, rather than just a fluke. But what does that really mean?

To understand scientific certainty, there are a couple of ideas to discuss. The first one is quite shocking--almost every measurement made is incorrect. We have to be careful about what that means, because it doesn't mean what it implies. There's a hypothetical measurement that's exactly perfect. While a measured value may not match the hypothetical measurement, it should agree within the uncertainties you quote.

This is hard to imagine abstractly, so let's talk about an example. Suppose you have an Olympic-sized swimming pool that should be filled with an equal number of blue and red marbles. Suppose further that the marbles are mixed well. You don't have the time to count every marble in the pool to verify that mix is really half and half, so you sample the marbles by grabbing a few. From that sample, you try to estimate what the percentage of blue marbles is in the entire pool.

If you grabbed a single marble for your sample, either it'd be red or it'd be blue. Your measurement would say that the pool consisted of zero percent blue marbles or 100 percent blue marbles. Either conclusion is wrong. Luckily, it is easy to see that this approach is insufficient to estimate the mix of red and blue marbles.

Toss the marble back in and take two other marbles. There's a chance you'll get lucky and grab one red and one blue marble, concluding that the pool's mix was half and half. That happens 50 percent of the time. But the other 50 percent of the time, you'd pick two red or two blue marbles. Half of the time you'd get the right answer, but half of the time you'd get the wrong answer.

To get a more accurate estimate of the percentage of blue marbles, you need a larger sample. Grab a five gallon pail of marbles from the pool and count those. This sample would be more accurate, but probably still wouldn't be completely accurate. It is a fundamental fact of measurement that if you look at a small sample of the data, it is unlikely that you'll get a perfectly-accurate answer.

So, what's the chance that a larger sample will still yield an inaccurate result? Starting small, with two marbles, there's a 25 percent chance that they'd both be blue. If you take 10 marbles and they all happened to be blue, it's starting to become unlikely. That can only happen one time in a thousand. Up the ante to 20 marbles that all turn out to be blue, and your statistical likelihood drops to one time in a million. If that happened, you'd have pretty much ruled out the idea that the pool had an equal number of blue and red marbles. Note I said "pretty much." In science, nothing is 100 percent certain, just more or less likely. Eventually it's likely enough to believe.

It's similar in the search for new physics-we look for more of a particular kind of collision than we expect. We want to figure out how often what we measure can occur, even if there isn't any new physics. The situation is complicated in physics, because we look at thousands of data plots. It isn't so surprising that at least one plot has something weird, especially when so many plots are involved. Just like you wouldn't want to conclude the mix wasn't equal on the basis of two blue marbles from the pool, scientists don't want to announce the discovery of something on the basis of something that could happen just by chance. In particle physics, we say that we have "evidence for" something if there is only one chance in a thousand of it occurring by accident. To claim we have discovered something, the chances that it occurred by accident have to be less than one in a million.

The most important thing to remember is that an unusual measurement can happen and mislead the unwary. Understanding the interplay between rarity and probability is an invaluable skill in science and in life.

Want a phrase defined? Have a question? Email today@fnal.gov.

—Don Lincoln