Good morning, y'all. I have a special treat for all fourteen of you today. For this post, I'm turning over the keyboard to fellow Ace of Spades HQ Moron Dave in Florida wherein he'll explain what the poll numbers should be.
I'm sure that everyone sees all the poll results that show President Obama with a slight lead nationally despite all evidence that should show him trailing Mitt Romney. There's more to this than meets the eye, as is common with political matters.
Dave is an engineer and works with numbers for a living as I once did. He's gone through the figures in these polls and will show that this presidential race is far from over and Obama is not in the lead as the MFM would have you believe.
Fair Warning: this post contains maths and it's quite comprehensive in showing how the actual numbers were reached. I concur with Dave's conclusion: Mitt is actually ahead.
Got your coffee ready? Good.
Take it away, Dave...
There has been considerable discussion among us all regarding the validity of
the national polls that are being published. Obviously, when polls are being
put out that use samples that are larger than the results of the 2008 election,
then many of us become suspicious. With this latest round of Eeyorism (Erick
and Ace, I'm looking at you), I decided that I was going to take a real look at
the polls and see if I could make sense of them.
As I examined the top line numbers, I realized what has been bugging me for
months about the polls. There isn't enough noise in the samples to make them
credible. Let me give an example I am used to in the real world. If you sample
a specific frequency for a signal, you expect to see a lot of noise from other
signals in other bands interfering with your sample, due to harmonics. What is
done is, the sample band is run through a filter to clean up the signal, removing
spurious signals first.
Look at the results of 7 polls currently being used in the RCP average:
Tipp - O+2
CBS/NYT - O+3
Fox - O+5
Ipsos - O+3
Dem Corps - O+5
ABC/WaPo - O+1
WSJ/NBC - O+5
Those are all in a nice tight grouping with an average of 3, with not a single
results more than 2 away from the average. It looks like nothing is outside of
a single standard deviation. That alone is a HIGHLY unlikely event. I'm not
expert enough of a statistician to figure out the odds, but the probability of
getting a random sample this tight is very low against a truly random signal.
In the signal processing world, we would look at this result and ask what type
of filter was used on the source signal, it doesn't occur in nature.
So if I want to be paranoid, I can think of an alternate sampling method that
would result in these results. Suppose I start with a very large sample with a
relatively even distribution, such as 100,000 marbles which are half red and
half blue, then I determine that I want to end up with a result that I have
selected 52% blue marbles. What I can do is keep picking marbles until I have
selected enough to show a random sample was used (e.g. at least 800 selections)
but not stop the selection until I reach the 52% number. I might get there at
800, or at 900, or even at 1000. But it won't take me long to get a result that
is a desired small deviation away from the actual distribution of the entire
If I then report those results as an actual probabilistic outcome and not report
that I had a specific result desired, then it is not discernible from examining
the results. Hence the old saying, "lie, damn lies, and statistics".
Over the last few days there has been some discussion of a few web postings
about this over sampling, and the Rasmussen Party ID poll. If you aren't
familiar with the latter, Scott Rasmussen conducts a monthly sample of partisan
identification. It is one question, asking are you a Republican, Democrat, or
Independent? The sample size is 15,000, which in polling terms is an enormous
sample. Most credible polls require about 800 samples to reach a 4% margin of
The August result for this poll is as follows:
This R+4 results is dramatically different from the sampling results used by all
of the polls above. These polls are using samples that range from D+4
(ABC/WaPo) to D+13 (CBS/NYT). The basis for such dramatic over sampling of
Democrats is never given, but the Rasmussen poll (and it's inherent accuracy
given sample size and simplicity) shows these samples to be inappropriate.
Again, being paranoid, it looks a lot like sampling to produce an intended
result, rather than justifying the sampling due to demographic expectations.
The problems with these polls are then further compounded by the RCP average.
This is commonly reported by the media, even on Fox. However, the value of an
average is questionable, when the demographic samples are different. A more
useful average would be if every poll was using a single demographic metric,
then averaging them.
I decided to see if it was possible to reweight these polls to do exactly that,
using the Rasmussen party ID as a baseline.
1) I'm starting with the Rasmussen party ID results. Since August showed a
dramatic rise in Republican identification, I am normalizing the results and
using the average in this poll over the last three months. The partisan ID mix
that I use for normalizing all of the media polls is D/R/I of 33.8/36.0/30.3. I
am confident that this represents a conservative view of the actual electorate,
without considering voter enthusiasm or any other demographic breakdown like
ethnicity or gender.
2) I then go into the internals of each of the polls and adjust their results to
account for partisan shift. I am not going to try to equalize based on factors
such as number of Democrats supporting Romney or Republicans supporting Obama.
I assume that a Republican is worth equal "weight" of Romney support as a
Democrat is of Obama support. I also assume that an Independent or Undecided is
neutral. 50% of both types will support Obama and Romney (or stay home, which
is also a neutral effect).
3) If a poll identifies a level of Independent support for a candidate, then I
violate the above rule and assign a weighting factor to Independents of a
corresponding value. For example, if the poll shows Independents support Romney
55 to 45, then a .55 weighting factor is used for Independent voters, rather
4) Finally I add or subtract the appropriate weights corresponding to the
correct demographics. If Democrats are over sampled by 4 points (38% used by
the poll), then 4% worth of Obama support is subtracted. If Independents were
under sampled by 4 points (26% used by the poll), then 2% worth of Obama support
is added (the 50% Obama support with Independents rule).
Reweighting all of these polls results in the following:
Tipp - R+1.37
CBS/NYT - R+6.52
Fox - O+4.48
Ipsos - O+4.35
Dem Corps - O+0.79
ABC/WaPo - R+5.54
WSJ/NBC - R+2.43
The average of these reweighted polls is then a Romney lead of 0.89%, rather
than an Obama lead of 3%. Almost a 4% shift. We also see a much more noisy
result, with polling results ranging from Obama 4.48 to Romney 6.52. This is a
much more believable result, since we are seeing variation over more standard
Note a couple items here. First of all, it is tempting to use Rasmussen's
latest results, which would given Romney a 2 point lead in the average. But I
took a more conservative approach and used the average. So the result is a
45,000 sample partisan ID poll conducted over 3 months. Second, for half of
these polls, I am using a very conservative value of 50% for Independent
support. All of the polls that report this value, report Romney with a
significant advantage in Independents (with the exception of the Fox poll).
This analysis also assumes that undecideds will break 50/50, which is
historically untrue. Finally, this analysis does not account for partisan
enthusiasm, if the reported measurement of that is true, then Republicans will
turn out higher than Democrats, increasing Romney's final results.
As a final piece of data, let's go ahead and average in the Rasmussen tracking
poll (Romney +3) as of Sep 18 when I did this analysis:
Current reported average: Obama +2.63%
Actual average of normalized polls: Romney +1.16%
I think this is a much better view of where the race currently stands than is
Dave in Florida