The year is divisible by four, isn’t it. I can tell by the noises coming out of my TV. They’re more bullshitty than normal.

It’s election season and that means that the news networks, newspapers, magazines, and blogs are awash with stories like this:

Mitt Romney leads President Obama by 3 percentage points in the deeply red state of Arizona, according to survey conducted by Public Policy Polling on behalf of Democratic think tank Project New America.

Romney leads 49 to 46 in Arizona, which is within the poll’s 3.5 percentage point margin of error. Arizona has only gone for the Democratic presidential candidate once since 1952.

So what does that mean?

As a biologist, I’ve never used the phrase “margin of error” before. I bet most of the scientists reading this haven’t either, because when we use summary statistics we’re generally interested in means, medians, confidence intervals, and standard deviations*. In fact, those are basically what’s taught in introductory statistics classes from high school to college. But if you’ve worked with statistics you’ve probably used a margin of error, even if you haven’t used “margin of error.”

Let’s take a step back. The fundamental problem of taking a poll is that barring a very strange accident of chance, you’re unlikely to get a sample that perfectly reflects your population. So for example, if you poll 100 people from a population of 100 million in which 46 million will vote for Mitt Romney and 54 million will vote for President Obama, it would be very odd indeed to find 46 Romney supporters and 54 Obama supporters. You’re more likely to get a sample of 48:52 or 43:57, or maybe even something like 51:49, just by chance. For the purposes of this example, I’m going to say we got a result where Romney trails Obama 47% to 53%.

So in order to reflect this chance we use measures of variance. In biology, standard error is usually the statistic of choice. In other fields of science, one might use the standard deviation of the mean, a confidence interval, or the raw variance statistic. For reasons I’ll expound on in a later post, I much prefer standard error, followed by confidence intervals.

A confidence interval (CI) is the range between which we can be reasonably certain that the true population mean (in our example above, 0.46 for Romney, 0.54 for Obama) falls. What we mean by “reasonably certain” varies. In physics, it’s not uncommon to say that anything less than 99.9% is not reasonably certain. Most sciences, and particularly the messy biological and social sciences, are a bit more forgiving of error and a standard of 95% is usual. So a 95% CI means that if we take 100 similar random samples, 95 of them will include the true population mean in their confidence interval.

In scientific papers, it’s common to report a CI. In the example above, our CI for Romney’s support is approximately 0.37–0.57. Since there’s no “undecided” or third party candidates in our example, any gain for Romney is a loss for Obama and vice versa, so the CI for Obama’s support is 0.43–0.53. In both cases, the CI is 0.20, or 20%, wide. (If we increased our sample size it would be much smaller.)

Reporting results this way can be clumsy, so pollsters report the results instead as a mean ± a margin of error. The margin of error is always one half the confidence interval, so in our example, pollsters would report that Obama leads Romney 53% to 47%, with a margin of error of ±10%.

And yes, for those of you who know how to calculate a confidence interval or standard error, it is that simple. It’s not some new summary statistic. It’s just a new name.

Coming up: some caveats you need to know about interpreting margins of error in polls.

*Sometime soon I’ll get a post up about these, since they’re important topics.

Resources

Advertisements