polling-1000

The news always seems full of surveys/polls about this or that, trying to predict trends or outcomes and explain society. Nowhere is polling more prevalent than in the political arenas. One popular place to go for polling data is Rasmussen Reports, which says of itself, “If it’s in the News, it’s in our polls.”  They do many surveys too, but this is just a poll of another kind. Here are some seasonal examples I looked up on their website (as of 11/27/2015):

1.) Nearly 3-out-of-4 American Adults (72%) think stores start the Christmas season too early.
2.) 43% of American Adults say they have started their gift shopping. 54%have not.

About these polls it is told that 1,000 American adults were surveyed and that “The margin of sampling error is +/- 3 percentage points with a 95% level of confidence.” Hmmm, what does that mean? To understand this, some definitions are in order, specifically ‘margin of error’ and ‘level of confidence’.

Margin of Error (MoE) – Measure of the accuracy of the results, which indicates the difference between an estimate of something and its true value.
Level of Confidence (LoC) – Measure of the reliability of a result, which tells how confident we are in the margin or error.

Polls and surveys work by asking a random sample of the total population a series of questions. Obviously they can not ask the total population (perhaps hundreds of millions), so they sample in a random way (it’s cheaper and quicker) and use that data to state something relevant. The numbers themselves can be thrown around, but how accurate are they? That’s where MoE and LoC come into play. It’s important to remember that the MoE and LoC depend on the sample size, not the total population size, if that total population size is large. For a 95% LoC, the MoE turns out to be 0.98/√n, where n=1000 (the sample size). Do the math and it is 0.98/√1000 = 0.03 (or +/- 3%). In simple terms, this means that the survey/poll is 95% confident that the error between the sampled population and the total population is +/- 3%. Said another way, if you keep polling in the same way, then 95% of the time the answer you get will be within 3% of the correct answer. The mathematics reveals that (contrary to popular belief) the relative sample size matters less than the absolute sample size. That is, the results are independent of the total population, no matter how big it is, and it is just the sample size itself of that population that matters. How is it possible that a sample size as small as 1000 out of a total population in the millions or hundreds of millions has an MoE as small as +/- 3%? Welcome to the nature of the so-called ‘Bell Curve’. It’s also called the ‘normal distribution’ and is is a tool statisticians use to tell how far the sample is likely to be off from the overall population, that is, how big a MoE there is likely to be in a survey/poll.

Bell-MoE

Under the most ideal conditions, the above is generally true, but a more realistic condition is that an LoC of at least 95% requires that LoC >[1 – 1/(4n*MoE^2)], which for n = 1000 gives MoE ~ 0.07 (or 7%). This turns out to be a more realistic number for mathematical reasons relating to the sampling itself and randomness (see Small samples, and the margin of error). Further, even this is somewhat idealized in scenario and questions can come up as to nature of population sampled, questions refused, undecided, understood, truthful and other intangibles which can play a role. Survey and polls can be widely off depending on the nature of the questions and how they are answered or not answered. Treat them all with skepticism, but bear in mind they CAN be accurate even with a sample size as small as 1000. This seems to be the magic number (n=1000) most survey/poll people use to get the 95% LoC with 3-7% MoE, and usually the ideal case of 3% MoE.

The truth of political polling is that if 3% MoE is acceptable 95% of the time, then that is what they go with. People who poll and survey seem to have settled on this and the sample size is usually 1000 people. It sounds unbelievable, but it’s true from a mathematical perspective. In all human endeavors there are always intangibles to be considered (some of which I’ve mentioned) and these can make survey/polls quite unreliable. In addition they can quickly become irrelevant soon after they are taken when events or circumstances change. My best advice it to treat them as you might the daily Horoscope, realizing they encompass a multitude of possibilities, but the reality is in the outcome itself. The mathematics does not lie and can be a predictor of trends and outcomes, even with a small population. The greatest variable is not the behavior of human beings, which can reasonably be predicted under certain conditions, but the human beings themselves, who are both the predictor and predicted simultaneously. We tend to change with the wind. I think of it as weather, which changes from day to day, week to week, month to month, but climate itself is the long term average of weather, which can be predicted. Polls/surveys are like the weather and change daily, weekly, monthly like weather, but long term maybe can be averaged to predict human behavior. This is somewhat the basis of Isaac Asimov’s Foundation Series where the science of psychohistory can predict the track of humanity into the far future, but the random element always plays a role, which can throw predictions off.

Foundation-Trilogy

Remember always, mathematics doesn’t lie, but people do, though not always intentionally. We live in a very partisan and biased culture where so-called ‘news’ media conduct their own polls, present the results without even understanding the mathematics of what it means. These media personalities of today are mostly sensationalist and/or just want to promote their conservative and/or liberal cause, what ever those nomenclatures mean anymore. I still remember the words of Dr. Fitz, as we called him, my Advanced Civics teacher in high school back in the late 1970’s who told us to read, listen and watch, then read between the lines. That advise has stuck with me my whole life and never has it been a more valuable lesson than in our culture today.

Note: In general, for Margin of Error (MoE) at various Levels of Confidence (LoC), use these formulas, where n=sample size:

MoE at 99% LoC ~ 1.29/√n
MoE at 95% LoC ~ 0.98/√n
MoE at 90% LoC ~ 0.82/√n

If the sample fraction is > 5% of the total population, then also multiply the results by the factor √[(N – n)/(N – 1)], where n = sample population, N = total population. This is the ‘finite population correction’. Usually the N >> n, so this correction is negligible.

There are also Margin of Error calculators you can use, such as:

http://www.americanresearchgroup.com/moe.html

Statistics and mathematics aside, it’s really the quality of the questions, how they are asked and responded to that matter more perhaps. That is, how sound was the methodology of a survey or poll, and was there any ‘built-in’ (intentional or unintentional) bias? Statistics alone can not answer that, as it’s a more subjective question. Non-sampling errors can always creep in, even in the best designed survey/poll. These include true randomness, poorly designed questions, poor interviewers, and a host of other factors. These non-sampling errors can, in fact, often exceed the sampling errors themselves. It’s always best to treat surveys/polls with some skepticism and the statistics behind them are not always just an indicator of their reliability.