What is a likely voter?
A likely voter is a registered voter who, according to a poll's methodology, has a high probability of actually casting a ballot in an upcoming election. Pollsters identify likely voters by scoring respondents on questions that correlate with turning out, then keeping the highest scorers. There is no single universal definition. Each polling organization operationalizes the term differently, which is one reason two polls of the same race can report different results.
The distinction matters because a large share of eligible Americans do not vote. As Gallup puts it, reporting voter preferences on the basis of all national adults or all registered voters does not generally provide the most accurate estimate of the vote in a given election. Narrowing the sample to those most likely to turn out is the step that makes a pre-election poll a forecast rather than a snapshot of opinion.
Definition: likely voter
Likely voter. A registered voter whom a poll's turnout model classifies as having a high probability of casting a ballot in a specific upcoming election. Identified through questions about voting intention, past voting behavior, knowledge of the voting process, and interest in the campaign, then filtered to a subset sized to match projected turnout. The exact construction varies by polling organization. There is no canonical formula.
AAPOR, the American Association for Public Opinion Research, states there is “a consensus in the polling community that it is better to report ‘likely’ voters than ‘registered’ voters, especially as Election Day approaches.” Pew Research Center defines likely voters as a sub-group of registered voters identified by “their answers to questions about their intention to vote, their past voting behavior, their knowledge about the voting process, and their interest in the campaign.”
How is a likely voter different from a registered voter?
Polls typically report on three nested groups: all adults, registered voters, and likely voters. Registered voters are everyone legally able to vote. Likely voters are the most restrictive subset, adding a filter for the intent and history that predict actual turnout.
The two groups do not lean the same way. Likely-voter samples tend to show a Republican advantage relative to registered-voter samples, because higher-propensity voters skew older and more reliably Republican. Pew found that in 15 of 19 fall election surveys it conducted from 1996 to 2008, likely voters were at least slightly more supportive of Republican candidates than registered voters were.
The shift can be large enough to change the story. In Pew's final-weekend survey before the 2008 election, Barack Obama led John McCain by 11 points among all registered voters (50 percent to 39 percent). Limited to likely voters, that lead narrowed to seven points (49 percent to 42 percent). The four-point move came entirely from applying the likely-voter screen, not from any change in opinion.
How do pollsters identify likely voters?
There is no magic formula. AAPOR is explicit: “Research finds no single magic bullet question or set of questions that can determine likely voters with 100 percent accuracy.” Four broad methods are in use.
The multi-question index. This is the original approach, and still the most common. Pollsters score each respondent on a series of questions shown to correlate with turnout, then set a cutoff based on expected turnout. If a pollster believes turnout will be 60 percent, they take the 60 percent of respondents with the highest scores as likely voters. Because the component questions differ between organizations, the same data can yield different toplines.
Simple self-report. Ask respondents whether they plan to vote and exclude those who say no. The weakness is overstatement. Pew notes that 90 percent or more of registered voters say they plan to vote, so most researchers add a follow-up question gauging certainty.
Voter-file matching. Pollsters match respondents to commercial voter files to obtain a verified record of past voting, supplementing or replacing self-reported history. Self-report is unreliable on its own. Pew's validation work found that of voters who rated their likelihood of voting a 9 or 10 on a 10-point scale, 75 percent were verified as having actually voted, while only 34 percent of those rating themselves 7 or 8 had voted.
Probabilistic and multivariate scoring. Use logistic regression or machine-learning models to estimate each respondent's probability of voting, then apply a turnout cutoff. The 2024 AAPOR task force found this family of methods outperformed simple self-reported screens.
The Gallup seven-question model
The canonical index traces to Gallup. Gallup has used likely-voter models since 1950, scoring each respondent on a 0-to-7 scale built from seven standard questions: thought given to the election, knowing where to vote, having voted in the current precinct before, habitual frequency of voting, plan to vote, certainty of voting, and whether the respondent voted in the last comparable election. Respondents not registered to vote, or who say they will not vote, are scored 0. The top scorers, equal to projected turnout, are classified as likely voters.
The model was developed by Paul Perry, Gallup's president and research director, in the 1950s, and published in 1960 as “Election Survey Procedures of the Gallup Poll” in Public Opinion Quarterly. During the 1950s, Gallup validated the model by sending interviewers to vote registrar offices after elections to check whether respondents had actually voted. Across presidential and congressional elections from 1950 to 1958, the model reduced the average deviation from reality on Gallup's polls from 2.8 percentage points among registered voters to 1.1 percentage points among likely voters. Pew, Gallup, and other organizations still use variations of the Perry-Gallup scale.
Why is the likely-voter screen contested?
The screen is contested because it requires predicting a population that does not yet exist. As Pew put it, election polls “are asked to produce a model of a population that does not yet exist at the time the poll is conducted, the future electorate.” Two judgment calls drive the disagreement: which questions to use, and where to set the turnout cutoff.
The cutoff alone can move the headline. AAPOR illustrates with a Minnesota example in which changing the assumed turnout from 75 percent to 79 percent of eligible voters flips the estimated margin from a two-point Republican lead to a two-point Democratic lead, a four-point swing produced by a modeling assumption rather than by voters. And as AAPOR notes, “no one knows what actual turnout is going to be. It's like trying to hit a moving target.”
The recent record shows both the limits and the gains. The 2020 cycle produced polling error of an unusual magnitude, the highest in 40 years for the national popular vote, with state-level presidential polls overstating Biden's margin by 4.3 points on average. Notably, the AAPOR task force did not pin that on likely-voter models. Its chair, Josh Clinton of Vanderbilt University, said the task force “did not find evidence that errors in likely voter models, which are used to predict who in a survey will vote, were responsible for the errors in the poll's results,” pointing instead to issues consistent with nonresponse. The 2024 cycle was more accurate, with average absolute error on the two-party margin falling to 3.3 points from 5.3 in 2020 and 5.2 in 2016. The 2024 task force also found that polls using multivariate turnout models or registration-based screens had somewhat lower average errors than those relying on simple self-reported likelihood-of-voting questions. The reported advantage came from those models better approximating actual turnout in a year with shifting participation.
Does “likely voter” map onto a definable population?
Not as a fixed list, but as a measurable construct. Every method above is an attempt to estimate one underlying quantity: the probability that a given person votes. Voter-file vendors now express this directly, assigning each individual a modeled turnout-propensity score, though the exact algorithms are proprietary. Pew's 2018 work found these scores improved estimates over self-report alone: applied to the 2016 race, modeled turnout scores narrowed Clinton's estimated lead from seven points among all registered voters to a range of three to five points, against her eventual two-point national margin.
So a likely voter is best understood not as a fixed group of people but as the output of a model: a registered voter whose combination of demographics, stated intent, expressed interest, and verified history places them above a turnout threshold the pollster has chosen. Pre-election polls of likely voters are, in AAPOR's words, “best estimates.” Generally good ones, but estimates nonetheless. Reading any likely-voter poll well means asking which method produced the screen, since that choice shapes the result as much as the answers do.
References
Gallup, Inc. (2010). Understanding Gallup's Likely Voter Models. Gallup.
Newport, F. (2000). How Do You Define “Likely Voters”? Gallup.
Blumenthal, M. (2004). Likely Voters IV: The Gallup Model. MysteryPollster.
Pew Research Center (2016). Appendix A: The Perry-Gallup Measures. Pew Research Center.
Pew Research Center (2016). Can Likely U.S. Voter Models Be Improved? Pew Research Center.
AAPOR Task Force (2021). AAPOR Task Force on 2020 Pre-Election Polling. AAPOR.
AAPOR Task Force (2025). AAPOR Task Force on 2024 Pre-Election Polling. AAPOR.
Perry, P. (1960). Election Survey Procedures of the Gallup Poll. Public Opinion Quarterly, vol. 24, pp. 531–542. (Primary source, paywalled; quoted here via Blumenthal 2004 and Pew 2016. No open-access URL.)
More from the knowledge base.
What is a synthetic audience?
A plain definition of synthetic audiences: AI-generated populations that behave like real ones, how they are built and calibrated, what they are good for, and what separates a defensible one from a generic LLM wrapper.
Read article →What is a cohort?
What a cohort is across four research traditions, what AAPOR disclosure elements require, and why cohort precision determines synthetic audience fidelity.
Read article →What is a synthetic respondent?
The individual record in a synthetic study: how it differs from a persona, cohort, or audience; grounded vs prompted approaches; where it works and where it fails.
Read article →Put the platform in front of a real decision.
Bring a decision your team is working on. A research engineer will draft the cohort, the sample, and the study with you, in one working session. The methodology comes out with the result.