There are a few caveats you need to know about margins of error. A lot of people will say (and I have in the past been guilty of this lapse) that a confidence interval or margin of error represents the range in which there is a 95% chance that the population mean falls. This is not true. For example, let’s imagine that we’ve taken seven polls in a week, and we don’t expect that anything has significantly changed between them (Romney has not shot a dog; Obama has not worn a turban), and we get these results:

Romney Obama Margin of Error
 45%  55%  ±2%
 48%  52%  ±3%
 50%  50%  ±2%
 48%  52%  ±4%
 46%  54%  ±2%
 47%  53%  ±1%
 59%  41%  ±3%

Would we conclude, based on the last poll, that there is a 95% chance of Romney having between 56% and 62% of the vote? Of course not. We conclude that the last poll was one of those 5% of polls that do not include the population mean in its confidence interval.

Another issue is that many pundits will say that, if two candidates are within the margin of error of each other, it’s a “statistical dead heat,” or, conversely, if the difference between the two candidates is more than the margin of error that there’s no question about the result. This is not necessarily wrong, in that it always ends up with an incorrect result, but it is wrong mathematically. If you want to know who will win you want to know the difference between the two candidates’ support, not their individual support. In statistics, this makes a big difference. If there are only two options (“Romney” and “Obama”), the margin of error of the difference is twice the margin of error of the votes. (So in our example above, there’s a whopping ±6% margin of error of the difference.) If there are more than two options (“Romney,” “Obama,” and “undecided”), the calculation is more complex, but it’s usually still very close to twice the margin of error of the votes. This means that when a poll reports Obama at 49% and Romney at 43% with a margin of error of ±3.5%, while you can be reasonably certain that Obama has between 45.5% and 52.5% of the vote and Romney has between 39.5% and 46.5% of the vote, you can’t be reasonable certain that Obama has more votes than Romney, since the difference between the two is less than the margin of error.

Finally, the margin of error doesn’t include all sources of error. It only includes sampling error, and any other sources of error can seriously mess with the poll. For example, pollsters’ models try to correct for things like the likeliness a given demographic will vote and the fact that some demographics are less likely to be included in polls (like those without land-line phones). If they get any of these guesses wrong, it introduces a source of error the margin of error doesn’t include. Similarly, if people lie to the pollsters (as they frequently do on issues such as race) or the pollsters asked a leading, framing, or misleading question (as they do basically all the time), the margin of error can’t describe the error that introduces.

What this all means is that polls are essentially useless. Given how evenly split the public is on most issues we’re interested in poll data for and the magnitude of the margins of error, plus the sources of error not included in margins of error, the chance of any given poll giving any sort of useful information is close to nil.

Before you lose all hope of predicting the winner of an election, however, consider this: if you pile a bunch of polls together you get a single poll with a much larger sample size, and thus a much smaller margin of error. By taking results of polls from multiple different pollsters, you also reduce the effect of individual polls’ and pollsters’ shortcomings. This is why FiveThirtyEight is so awesome, because that’s exactly what they do. Their predictions have been really accurate, for both federal and state elections. I can’t recommend them more.

Seriously, go check them out.

Advertisements