Friday, April 27, 2007

Sample Sizes

The best way to figure this one out is to think about it backwards. Let's say you picked a specific number of people in the United States at random. What then is the chance that the people you picked do not accurately represent the U.S. population as a whole? For example, what is the chance that the percentage of those people you picked who said their favorite color was blue does not match the percentage of people in the entire U.S. who like blue best?

(Of course, our little mental exercise here assumes you didn't do anything sneaky like phrase your question in a way to make people more or less likely to pick blue as their favorite color. Like, say, telling people "You know, the color blue has been linked to cancer. Now that I've told you that, what is your favorite color?" That's called a leading question, and it's a big no-no in surveying.)

Common sense will tell you (if you listen...) that the chance that your sample is off the mark will decrease as you add more people to your sample. In other words, the more people you ask, the more likely you are to get a representative sample. This is easy so far, right?

Okay, enough with the common sense. It's time for some math. (insert smirk here) The formula that describes the relationship I just mentioned is basically this:

The margin of error in a sample = 1 divided by the square root of the number of people in the sample

How did someone come up with that formula, you ask? Like most formulas in statistics, this one can trace it roots back to pathetic gamblers who were so desperate to hit the jackpot that they'd even stoop to mathematics for an "edge." If you really want to know the gory details, the formula is derived from the standard deviation of the proportion of times that a researcher gets a sample "right," given a whole bunch of samples.

Which is mathematical jargon for..."Trust me. It works, okay?"

So a sample of 1,600 people gives you a margin of error of 2.5 percent, which is pretty darn good for a poll. (See Margin of Error for more details on that term, and on polls in general.) Now, remember that the size of the entire population doesn't matter here. You could have a nation of 250,000 people or 250 million and that won't affect how big your sample needs to be to come within your desired margin of error. The Math Gods just don't care.

Of course, sometimes you'll see polls with anywhere from 600 to 1,800 people, all promising the same margin of error. That's because often pollsters want to break down their poll results by the gender, age, race or income of the people in the sample. To do that, the pollster needs to have enough women, for example, in the overall sample to ensure a reasonable margin or error among just the women. And the same goes for young adults, retirees, rich people, poor people, etc. That means that in order to have a poll with a margin of error of five percent among many different subgroups, a survey will need to include many more than the minimum 400 people in the overall sample.

No comments: