Taking the pulse of ‘the pulse of democracy’The promise and peril of poll aggregationG. Elliott MorrisMar 22, 2023 | Charlottesville, VA1 / 46

2 / 46

here to talk about my book strength in numbers, about how polls work and why we need them for a healthy democracy
but before we get there: a quick recent history of polls in 2016, 2018, 2020 and now 2022
2016 was in many ways the catalyst for writing this book
Then, I was still in college at the university of Texas, but I had reverse-engineered the 538 forecasting model in my spare time as a way to learn computer programming

3 / 46

You may remember, however, that polls that year were not very accurate
Had Clinton winning in Michigan, Wisconsin, pennsylvania, Florida and NC
All of which she lost

4 / 46

Overall, polls missed by 5 points in the states
One of the biggest misses ever

5 / 46

Looking at polls that were released only in the final 21 days of the campaign and only in competitive states, the average state poll missed Hillary Clinton’s margin over Donald Trump by about 3 points
But even worse, most of those polls systematically underestimated Trump — they had high bias not just high error
In fact, the average poll was more biased than at any point since 1998

The fundamental problem with polling

6 / 46

To understand the reason for that miss you must understand the fundamental problem that pollsters have to solve: nonresponse.
That is: the people who answer a poll may be different — demographically, politically, culturally — than the people who don’t.
And it turns out that solving that equation is very hard, in part because there is a lot of randomness in polls due to low response rates.

Pollsters use statistical algorithms to ensure their samples match the population on different demographic targets

Race, age, gender, and region are most common
Political variables (sometimes)
Can use weighting (raking) modeling (MRP), w various tradeoffs

7 / 46

Pollster get around nonresponse by adjusting (or “weighting” their samples to be demographically representative of the electorate.
If their poll should have 70% white voters, for example, but only 35% of respondents are white, then all the white voters get a weight of 2 and everyone else gets halved. They repeat that process for a bunch of traits: race, age, education, gender, region, etc.

Pew used 12 weighting variables in 2020!

8 / 46

Pew Research Center had 12 weighting variables in 2020!
All of this is try to make the data pollsters actually get representative of the data they should have got, in theory

2016: Demographic nonresponse bias

9 / 46

The canonical explanation now is that in 2016, the nearly uniform bias in polls came higher rates of nonresponse among non-college-educated voters
And you can see that with this graph.
The NYT estimates the if all national 2016 had weighted their data to be representative of race and education, their error would have dropped from 9 points to 2 points. Even though that’s not perfect, it is a big improvement!

2020: Partisan nonresponse bias

10 / 46

That’s because the problem in 2020 was not demographic nonresponse, but political nonresponse. Pollsters got the wrong mix of Trump voters within demographic groups. This time, the white non-college voters that answer their polls were systematically too democratic.

Partisan nonresponse bias11 / 46

Partisan nonresponse biasProblem reaching Trump voters overall
11 / 46

Partisan nonresponse biasProblem reaching Trump voters overall
And within demographic groups
11 / 46

Partisan nonresponse biasProblem reaching Trump voters overall
And within demographic groups
Something you cannot fix with weighting
11 / 46

Partisan nonresponse biasProblem reaching Trump voters overall
And within demographic groups
Something you cannot fix with weighting
Pollsters can adjust for past vote, but the electorate changes, and certain types of voters may not respond to surveys

11 / 46

And that’s something that is very, very hard to adjust for after the fact.
For one thing, there is no official record from the Census of how many democrats there are in the country.
And while you could run models on past polling data and try to weight to the right percentage of white 2016 Trump voters, that doesn’t help you among people who changed their minds over time.

12 / 46

13 / 46

At the state level, polls in 2020 underestimated Donald Trump by 4 points. That was an even larger bias than in 2016!
That was a shock to many in the industry who had changed their methods since the last time around. Even the pollsters who were adjusting their polls to be demographically representative of the population had large errors.

Bias vs variance14 / 46

On average, error in polls is low

15 / 46

Reference earlier slide
So while on average the absolute error of polls and averages is low by historical comparisons

But a good aggregate needs polls with low bias,16 / 46

But a good aggregate needs polls with low bias,especially at the state level.16 / 46

But a good aggregate needs polls with low bias,

especially at the state level.

16 / 46

What election forecasts really need is unbiased surveys
BC electoral college especially at the state level

Problem: bias in polls has been increasing17 / 46

and that's where pollsters run into their big problems

Problem: bias in polls has been increasing...

18 / 46

Since 2000 bias in the polls has been growing
At least in presidential election years

... and bias is correlated across levels and states

19 / 46

and bias correlated across states, too
Explain scatter plot

Solution 1: Less biased polls!20 / 46

Less biased polls21 / 46

Less biased pollsElection-year partisan non-response bias is present within both demographic and lagged partisan groups (party ID, past vote, approval)
21 / 46

Less biased polls

Election-year partisan non-response bias is present within both demographic and lagged partisan groups (party ID, past vote, approval)
Something you cannot fix with standard weighting.

21 / 46

Less biased polls

Election-year partisan non-response bias is present within both demographic and lagged partisan groups (party ID, past vote, approval)
Something you cannot fix with standard weighting.
So...

21 / 46

Less biased polls

Election-year partisan non-response bias is present within both demographic and lagged partisan groups (party ID, past vote, approval)
Something you cannot fix with standard weighting.
So...

Options:

21 / 46

Less biased polls

Election-year partisan non-response bias is present within both demographic and lagged partisan groups (party ID, past vote, approval)
Something you cannot fix with standard weighting.
So...

Options:

More weighting variables (NYT)

21 / 46

Less biased polls

Election-year partisan non-response bias is present within both demographic and lagged partisan groups (party ID, past vote, approval)
Something you cannot fix with standard weighting.
So...

Options:

More weighting variables (NYT)
More offline and off-phone data collection (Pew NPORS, SSRS, NORC)

21 / 46

Less biased polls

Election-year partisan non-response bias is present within both demographic and lagged partisan groups (party ID, past vote, approval)
Something you cannot fix with standard weighting.
So...

Options:

More weighting variables (NYT)
More offline and off-phone data collection (Pew NPORS, SSRS, NORC)
Mixed-mode samples (promising, but not yet popular among public pollsters)

21 / 46

The polls in 2022

22 / 46

Maybe this worked out?
Nate Cohn

The polls in 2022

23 / 46

In fact all pollsters had a good year

24 / 46

From traditional pollsters to some new ones

The problem with solution 1:No clear evidence that pollsters who changed methods did better than adapters25 / 46

The problem with solution 1:No clear evidence that pollsters who changed methods did better than adaptersJust as likely as pollsters got lucky as it is that any individual overperformed25 / 46

What happens when pollsters get unlucky again?26 / 46

But there is no evidence this is due to methods changes
Just getting lucky

"Solution" 2: Let the aggregation model debias the polls27 / 46

Case study: Economist model28 / 46

Case study: Economist model

???

We are fully Bayesian

28 / 46

???

In fact, here is our code!

29 / 46

30 / 46

We are fully Bayesian
We start with a fundamentals-based "prior" about the election outcome nationally
Mention what goes into that (ML regularization)
Decompose it into constituent state forecasts
Measure historic error with leave-one-out cross-validation
That gets plugged into the modeli n every state

Case study: Economist model31 / 46

Case study: Economist modeli. State-space model: Latent state trends (vote shares) evolve as a hierarchical random walk over time31 / 46

Case study: Economist modeli. State-space model: Latent state trends (vote shares) evolve as a hierarchical random walk over timeii. Polls are weighted by their historical error and biasBased on past relationship between a pollster's lagged historical bias and performance of the aggregate
31 / 46

Case study: Economist modeli. State-space model: Latent state trends (vote shares) evolve as a hierarchical random walk over timeii. Polls are weighted by their historical error and biasBased on past relationship between a pollster's lagged historical bias and performance of the aggregate
iii. Polls are observations with constant random effects to "debias" based on:Pollster firm (so-called "house effects")
Poll mode
Poll population
31 / 46

Case study: Economist modeli. State-space model: Latent state trends (vote shares) evolve as a hierarchical random walk over timeii. Polls are weighted by their historical error and biasBased on past relationship between a pollster's lagged historical bias and performance of the aggregate
iii. Polls are observations with constant random effects to "debias" based on:Pollster firm (so-called "house effects")
Poll mode
Poll population
iv. Polls are also adjusted for potential partisan non-responseEach poll has a covariate for whether it weights by party registration or past vote
Effect is allowed to change over time
Adjusts for biases that remain AFTER removing the other biases
31 / 46

Walk through slide

32 / 46

Show how the model works graphically

Notable improvements!

33 / 46

Point out how model does in past

In 2016...34 / 46

In 2016...... But not 202034 / 46

In 2016...

... But not 2020

34 / 46

This works in 2016
But not in 2020

The problem with solution 2:35 / 46

The problem with solution 2:1. Pollsters change their methods, making adjustments outdated35 / 46

The problem with solution 2:1. Pollsters change their methods, making adjustments outdated2. Even sophisticated adjustments don't work if there are no unbiased surveys!35 / 46

Problem with solution 2 is that
Pollsters change their methods over time, messing with the bias adjustment
And bias adjustments cannot work if there are no unbiased polls!

Solution 3: Conditional forecasting!36 / 46

Solution 3: Conditional forecasting!37 / 46

Solution 3: Conditional forecasting!- Present aggregates assuming some amount of polling bias.37 / 46

Solution 3: Conditional forecasting!- Present aggregates assuming some amount of polling bias.- As a way to explain to readers how bias enters the process of polling37 / 46

Solution 3: Conditional forecasting!- Present aggregates assuming some amount of polling bias.- As a way to explain to readers how bias enters the process of polling- And what happens to forecasts if bias now does not follow historical distributions37 / 46

Conditional forecasting:38 / 46

Conditional forecasting:

1. Debias polls

38 / 46

Conditional forecasting:

1. Debias polls

2. Rerun simulations

38 / 46

2. Rerun simulations

39 / 46

2. Rerun simulations

Advantage: leaves readers with a much clearer picture of possibilities for election outcomes if past patterns of bias aren't predictive of bias now (2016, 2020)

40 / 46

Explain the advantage

Conditional forecasting:

41 / 46

Here is what this could have looked like in 2020

Conditional forecasting:Advantage: Leaves readers with a much clearer picture of possibilities for election outcomes if past patterns of bias aren't predictive of bias now (2016, 2020)Disavantage: When polls are right, readers feel misled (2018, 2022)42 / 46

But only useful really in years with high uniform bias in the polls

43 / 46

The state of the polls44 / 46

So let's take stock

The state of the polls45 / 46

The state of the polls"The pulse of 'the pulse of democracy'"45 / 46

The state of the polls"The pulse of 'the pulse of democracy'"1. Individual surveys subject to high degree of error45 / 46

The state of the polls"The pulse of 'the pulse of democracy'"1. Individual surveys subject to high degree of error2. Industry-wide bias is unpredictable and potentially large45 / 46

The state of the polls"The pulse of 'the pulse of democracy'"1. Individual surveys subject to high degree of error2. Industry-wide bias is unpredictable and potentially large3. Not enough information to aggregate away this bias45 / 46

The state of the polls"The pulse of 'the pulse of democracy'"1. Individual surveys subject to high degree of error2. Industry-wide bias is unpredictable and potentially large3. Not enough information to aggregate away this bias4. But low variance from aggregation gives the (false) impression of precision45 / 46

The state of the polls"The pulse of 'the pulse of democracy'"1. Individual surveys subject to high degree of error2. Industry-wide bias is unpredictable and potentially large3. Not enough information to aggregate away this bias4. But low variance from aggregation gives the (false) impression of precision5. Solution is not in statistics, probably, but visualizaation and communication45 / 46

Walk through slide

Thank you!

Questions? (Suggestions?)

Website: gelliottmorris.com

Twitter: @gelliottmorris

Newsletter: gelliottmorris.substack.com

These slides were made using the xaringan package for R. They are available online at https://www.gelliottmorris.com/slides/

46 / 46

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help