The news always seems full of surveys/polls about this or that, trying to predict trends or outcomes and explain society. Nowhere is polling more prevalent than in the political arenas. One popular place to go for polling data is Rasmussen Reports, which says of itself, “If it’s in the News, it’s in our polls.”  They do many surveys too, but this is just a poll of another kind. Here are some seasonal examples I looked up on their website (as of 11/27/2015):

1.) Nearly 3-out-of-4 American Adults (72%) think stores start the Christmas season too early.
2.) 43% of American Adults say they have started their gift shopping. 54%have not.

About these polls it is told that 1,000 American adults were surveyed and that “The margin of sampling error is +/- 3 percentage points with a 95% level of confidence.” Hmmm, what does that mean? To understand this, some definitions are in order, specifically ‘margin of error’ and ‘level of confidence’.

Margin of Error (MoE) – Measure of the accuracy of the results, which indicates the difference between an estimate of something and its true value.
Level of Confidence (LoC) – Measure of the reliability of a result, which tells how confident we are in the margin or error.

Polls and surveys work by asking a random sample of the total population a series of questions. Obviously they can not ask the total population (perhaps hundreds of millions), so they sample in a random way (it’s cheaper and quicker) and use that data to state something relevant. The numbers themselves can be thrown around, but how accurate are they? That’s where MoE and LoC come into play. It’s important to remember that the MoE and LoC depend on the sample size, not the total population size, if that total population size is large. For a 95% LoC, the MoE turns out to be 0.98/√n, where n=1000 (the sample size). Do the math and it is 0.98/√1000 = 0.03 (or +/- 3%). In simple terms, this means that the survey/poll is 95% confident that the error between the sampled population and the total population is +/- 3%. Said another way, if you keep polling in the same way, then 95% of the time the answer you get will be within 3% of the correct answer. The mathematics reveals that (contrary to popular belief) the relative sample size matters less than the absolute sample size. That is, the results are independent of the total population, no matter how big it is, and it is just the sample size itself of that population that matters. How is it possible that a sample size as small as 1000 out of a total population in the millions or hundreds of millions has an MoE as small as +/- 3%? Welcome to the nature of the so-called ‘Bell Curve’. It’s also called the ‘normal distribution’ and is is a tool statisticians use to tell how far the sample is likely to be off from the overall population, that is, how big a MoE there is likely to be in a survey/poll.


Under the most ideal conditions, the above is generally true, but a more realistic condition is that an LoC of at least 95% requires that LoC >[1 – 1/(4n*MoE^2)], which for n = 1000 gives MoE ~ 0.07 (or 7%). This turns out to be a more realistic number for mathematical reasons relating to the sampling itself and randomness (see Small samples, and the margin of error). Further, even this is somewhat idealized in scenario and questions can come up as to nature of population sampled, questions refused, undecided, understood, truthful and other intangibles which can play a role. Survey and polls can be widely off depending on the nature of the questions and how they are answered or not answered. Treat them all with skepticism, but bear in mind they CAN be accurate even with a sample size as small as 1000. This seems to be the magic number (n=1000) most survey/poll people use to get the 95% LoC with 3-7% MoE, and usually the ideal case of 3% MoE.

The truth of political polling is that if 3% MoE is acceptable 95% of the time, then that is what they go with. People who poll and survey seem to have settled on this and the sample size is usually 1000 people. It sounds unbelievable, but it’s true from a mathematical perspective. In all human endeavors there are always intangibles to be considered (some of which I’ve mentioned) and these can make survey/polls quite unreliable. In addition they can quickly become irrelevant soon after they are taken when events or circumstances change. My best advice it to treat them as you might the daily Horoscope, realizing they encompass a multitude of possibilities, but the reality is in the outcome itself. The mathematics does not lie and can be a predictor of trends and outcomes, even with a small population. The greatest variable is not the behavior of human beings, which can reasonably be predicted under certain conditions, but the human beings themselves, who are both the predictor and predicted simultaneously. We tend to change with the wind. I think of it as weather, which changes from day to day, week to week, month to month, but climate itself is the long term average of weather, which can be predicted. Polls/surveys are like the weather and change daily, weekly, monthly like weather, but long term maybe can be averaged to predict human behavior. This is somewhat the basis of Isaac Asimov’s Foundation Series where the science of psychohistory can predict the track of humanity into the far future, but the random element always plays a role, which can throw predictions off.


Remember always, mathematics doesn’t lie, but people do, though not always intentionally. We live in a very partisan and biased culture where so-called ‘news’ media conduct their own polls, present the results without even understanding the mathematics of what it means. These media personalities of today are mostly sensationalist and/or just want to promote their conservative and/or liberal cause, what ever those nomenclatures mean anymore. I still remember the words of Dr. Fitz, as we called him, my Advanced Civics teacher in high school back in the late 1970’s who told us to read, listen and watch, then read between the lines. That advise has stuck with me my whole life and never has it been a more valuable lesson than in our culture today.

Note: In general, for Margin of Error (MoE) at various Levels of Confidence (LoC), use these formulas, where n=sample size:

MoE at 99% LoC ~ 1.29/√n
MoE at 95% LoC ~ 0.98/√n
MoE at 90% LoC ~ 0.82/√n

If the sample fraction is > 5% of the total population, then also multiply the results by the factor √[(N – n)/(N – 1)], where n = sample population, N = total population. This is the ‘finite population correction’. Usually the N >> n, so this correction is negligible.

There are also Margin of Error calculators you can use, such as:


Statistics and mathematics aside, it’s really the quality of the questions, how they are asked and responded to that matter more perhaps. That is, how sound was the methodology of a survey or poll, and was there any ‘built-in’ (intentional or unintentional) bias? Statistics alone can not answer that, as it’s a more subjective question. Non-sampling errors can always creep in, even in the best designed survey/poll. These include true randomness, poorly designed questions, poor interviewers, and a host of other factors. These non-sampling errors can, in fact, often exceed the sampling errors themselves. It’s always best to treat surveys/polls with some skepticism and the statistics behind them are not always just an indicator of their reliability.



This may be a touchy issue, but I thought I would weigh in on the news that seemed to be somewhat ubiquitous regarding Angelina Jolie and her prophylactic mastectomy. She wrote an article in the NY Times about it entitled My Medical Choice. It’s a personal choice and Jamie Lee Curtis seemed to praise her brave steps and quiet dignity in a Huffington Post article entitled Freedom of Choice, Freedom of Privacy. I can respect that, Angelina’s choice and some opinion on her bravery and dignity. I do, however, worry about what kind of message this sends. Yes, it is good to have a choice for health and longevity, but it’s not just a matter of statistics (but I will touch on the statistics here too). Being ‘at risk’ is not a disease. Even genetic or hereditary indicators does not account for exceptional cases where a gene is present but causes no disease. The science on this is all so very new in the last decade or two and I worry that decisions are being made based on incomplete science and misinterpreted statistics. There are social psychology issues involved here too in such so-called risk reduction surgery. My big concern with a high profile story like this is that it starts a wave of actions without thinking fully and just following a celebrity, who is really just a person like me, you or anybody, making personal decisions based on their perspective and private reasons. That’s something to think about.


I’d like to discuss statistics now. Do you know the difference between a single event probability and  a conditional probability? Is there a difference between the chances of something happening versus the frequency of occurrence of that same something happening? If you don’t know the answers to these questions then you are not alone. The medical community uses statistics to inform their patients, but your doctor probably does not really understand the statistics. He or She is a physician, not a mathematician, right? I’ll trust my doctor any day to prescribe an antibiotic for me, but to compute my odds for survival given a serious disease – no way! The doctors get these statistics from consensus in the literature. I will take the doctors numbers and then go investigate them. Let’s take the case of Angelina Jolie. In her article she says (because she tested positive for the BRCA1 gene):

“My doctors estimated that I had an 87 percent risk of breast cancer and a 50 percent risk of ovarian cancer, although the risk is different in the case of each woman. Only a fraction of breast cancers result from an inherited gene mutation. Those with a defect in BRCA1 have a 65 percent risk of getting it, on average. Once I knew that this was my reality, I decided to be proactive and to minimize the risk as much I could. I made a decision to have a preventative double mastectomy. I started with the breasts, as my risk of breast cancer is higher than my risk of ovarian cancer, and the surgery is more complex.”

She says later in the article:

“I wanted to write this to tell other women that the decision to have a mastectomy was not easy. But it is one I am very happy that I made. My chances of developing breast cancer have dropped from 87 percent to under 5 percent. I can tell my children that they don’t need to fear they will lose me to breast cancer.”


Was the choice Angelina Jolie made the correct one? Personal feelings aside (and that’s a self analyzing choice), it is hard to say and depends how you look at the statistics. Lets look at the absolute and relative probability. The absolute probability reduction says from 5 to 1 in 100, which means a risk reduction of 4 in 100, or 4% reduction in risk. On the other hand, the relative probability says 4 saved out of 5, or 80% reduction in risk. That is, the relative risk reduction is the absolute risk reduction (4/100) divided by the patients who die without treatment (5/100). Do the math (4/5=0.80). Another way of saying all this is the Number Needed to Treat (NNT). The number of women who undergo prophylactic mastectomy to save one life is 25 because 4 in 100 (1 in 25) is prevented by such surgery.

What can we really say about the statistical numbers presented by Angelina Jolie? She was speaking from a relative probability perspective (I think), going from 87% to 5% and reducing her chances of cancer by 82%. In terms of absolute probability it’s still only 4-5% risk reduction at best. The number needed to treat is important because it builds the population – 1 life saved in 25. What does it mean? It means that the life of one woman was saved, but the other 24 had no benefit from the mastectomy. Most high risk women don’t die of breast cancer, even though they keep their breasts, and few die of breast cancer either after having their breasts removed. I have considered the high risk category for discussion here like Angelina announced as a point for discussion, not judgement. I’d like to take the opportunity to extend good wishes for Angelina, Brad, their children & extended family during this time. Take care of each other!


My personal advice – Make YOUR own choice and know the numbers to make an informed one on Risk Reduction. Don’t just know the numbers, but know what they mean too! Remember there is absolute risk reduction, relative risk reduction and numbers needed to treat. We also have the single event and conditional probabilities too. It’s a head full to be sure, but not such an egghead thing when your life is on the line and body parts are involved. Kind of a serious post from me with a message. It’s not personal, it’s just something I wanted to say in a logical way, but I can’t help thinking (after writing this) that it has affected me in an emotional way too.