What is a synthetic audience?
A synthetic audience is an AI-generated population of respondents that behaves, in aggregate, like a real one. Instead of recruiting hundreds of people for a survey or a focus group, you describe the audience you want by demographics, geography, attitudes, behaviors, or any other variable, and the system produces a population of simulated respondents you can put questions to. The answers come back in minutes instead of weeks, at a fraction of the cost of running the same study with humans.
A synthetic audience is not a chatbot in a demographic costume. The credible ones are built on real survey data, census records, behavioral datasets, and psychometric instruments, then calibrated so the aggregate responses of the synthetic population match the aggregate responses of the real population it stands in for.
Where the idea comes from
Market research has always run on real people. Focus groups, surveys, panels, interviews. Every method of learning what an audience thinks, feels, or will do has historically meant recruiting actual humans, paying them, scheduling them, and waiting for them to answer. It is slow and it is expensive. A typical concept test takes weeks and costs tens of thousands of dollars. Reaching specialized audiences is slower and more expensive still, sometimes prohibitively so: C-suite executives, rare patient populations, voters in a single congressional district.
The recent shift is that large language models, trained on enormous bodies of human-generated text, have absorbed enough about how different kinds of people talk, reason, and decide that they can be prompted to respond as a particular kind of person. Ground that model in real, structured data about a population, not “act like a 35-year-old woman in Ohio” but a full statistical dossier built from census records, surveys, and behavioral data, and the responses start to track real human responses with surprising accuracy. Recent research has shown that LLM-generated synthetic consumers can replicate human preference distributions in consumer surveys at roughly 90% of human test-retest reliability.
That accuracy is what makes the category worth taking seriously. It was never news that you can ask an AI a question and get an answer. What is new is that, under the right conditions, you can ask an AI a question as if it were a thousand specific people and get an answer that is statistically representative of what those people would actually say.
What a synthetic audience is
A synthetic audience has three parts.
A target population. The group you want to understand, the cohort you are testing against. US registered voters. Women aged 25 to 40 who buy organic groceries. Small business owners in the Midwest. Gen Z mobile gamers. The population is defined by whatever variables matter to your question: demographics, geography, attitudes, behaviors, life stage, profession, anything.
A set of synthetic personas. Each persona is a structured profile of one simulated person: their demographics, their values, their personality, their relevant behaviors. In a credible system each persona is built from real data records, such as the General Social Survey, census microdata, or proprietary panel data, rather than invented by a model. The persona is not a free-form character sketch. It is a structured dossier that anchors the model to a specific, statistically grounded individual.
An inference engine. When you ask a question, each persona's dossier is fed to a model along with the question, and the model predicts how that person would respond. Aggregated across the full population, the responses give you a distribution: what share would say yes, which segments split on the question, where the disagreement sits.
A full audience usually runs from a few hundred to several thousand personas, depending on the question and the statistical resolution you need.
Good audiences versus bad ones
This is the part of the category that matters most, and the part that is easiest to get wrong. Anyone can ask a general-purpose chat model to pretend to be a focus group of fifty American voters. It will produce a plausible-sounding answer. It will also be mostly wrong, in ways that are hard to detect.
Four things separate a credible synthetic audience from a convincing one.
Grounded persona data. The personas should be built from real human data records, not generated wholesale by the model. Generated personas inherit the biases of the model that made them, typically a skew toward younger, more educated, more liberal, more online perspectives, because that is what the training data overrepresents. Real data records, properly sampled and weighted, avoid this.
Distributional calibration. A credible system is benchmarked against real survey data. It answers questions whose real-world answers are already known, and its builders measure how close the synthetic distribution lands to the real one. This is the only honest way to know whether the audience is working. A platform that cannot tell you its accuracy on known benchmarks is selling vibes.
Calibration has a ceiling worth understanding. Real people do not reproduce their own answers perfectly on retest, which sets an empirical upper bound on how accurate any audience, synthetic or human, can be. We cover this in the human self-replication ceiling.
Bias correction. Models carry well-documented systematic biases when asked to predict human behavior. They lean toward pro-social, optimistic, socially acceptable answers. People in model simulations report being more civically engaged, more environmentally conscious, and more brand-loyal than they actually are. A serious system anchors against empirical base rates from real survey data to correct for it.
Scoped honesty about what it predicts well. Synthetic audiences are strong at predicting responses to known categories of question: political opinions, brand preferences, attitudes toward familiar products, demographic-driven behaviors. They are weaker on the genuinely novel, things the model has no precedent for: a truly unprecedented product, a cultural shift that has not happened yet, a creative concept with no analog in the training data. A good platform says so plainly.
What you use it for
The use cases sort naturally by who is asking. Today the clearest demand comes from three kinds of buyer, though the method is not limited to them.
Business and brand
The most common application. Marketing, product, and insights teams use synthetic audiences to pressure-test ideas before committing real money to them.
- Concept testing. Try five product concepts against a target audience in an afternoon, see which ones resonate, refine the survivors, then validate the winners with real research.
- Messaging and creative testing. Run multiple variants of an ad, a headline, a landing page, or a positioning statement against the target audience and see which one lands.
- Pricing. Test willingness to pay across segments without revealing pricing intentions to the market.
- Segmentation. Explore how a market actually splits along the variables you care about, before designing the segmentation study you will run with real people.
- Hard-to-reach audiences. Get directional signal from groups that are slow or expensive to recruit: surgeons, CFOs, ultra-high-net-worth investors, people with specific medical conditions.
- Multicultural and multi-market. Test how a message lands across 15 markets in a day instead of fielding 15 separate studies.
Political and advocacy
Campaigns, advocacy organizations, and political consultancies use synthetic audiences for polling, message testing, and scenario modeling.
- Polling at scale. Forecast outcomes, model district-level dynamics, and track sentiment shifts faster and cheaper than traditional polling.
- Message testing across voter segments. Test how a speech, ad, or talking point lands with different voter blocs before deploying it.
- Voter persuasion modeling. Predict how persuadable segments will react to different appeals, and identify which messages move which voters.
Government and policy
Agencies, policy shops, and public sector organizations use synthetic audiences to model citizen response before rollout.
- Policy impact modeling. Predict how different demographics will react to a proposed regulation, tax change, or public program.
- Public communications testing. Pre-test public health campaigns, emergency communications, and policy announcements.
- Crisis and scenario planning. Model how the public will respond to a hypothetical event, so you can prepare rather than react.
What it can't do
A synthetic audience is a tool, not a replacement for human judgment or for real-world research. Some things it does poorly, and some it should not do at all.
- It is not a substitute for in-depth qualitative research. A synthetic audience can tell you that a concept tests poorly. It cannot replicate the moment in a real focus group when a participant says something nobody on your team would have thought of. Human creativity and human surprise are still human.
- It cannot test the truly unprecedented. If you are launching something genuinely new, a product category that does not exist yet, a creative concept with no cultural precedent, the model has nothing to draw on. Use real audiences for first-of-its-kind testing.
- It is not a final answer. Best practice is to use synthetic audiences for the rapid, iterative, exploratory phase of research, where speed and cost matter most, and to validate high-stakes decisions with real research.
- It is not a license to skip human accountability. Decisions that affect real people, pricing, hiring, public policy, should not be made on synthetic data alone. The output is a directional signal, not a verdict.
Compared to traditional research
| Traditional research | Synthetic audiences | |
|---|---|---|
| Time to results | Weeks | Minutes to hours |
| Cost per study | $10K to $100K and up | A fraction of that |
| Sample size | Limited by budget | Easily scalable to thousands |
| Hard-to-reach audiences | Often impractical | Available on demand |
| Novelty handling | Strong. Real people react to anything | Weak on truly unprecedented stimuli |
| Emotional and creative depth | Strong | Limited |
| Iteration speed | Slow | Fast |
| Statistical accuracy on known categories | Ground truth | Good and improving, calibrated against ground truth |
The right framing is not that synthetic replaces traditional. It is that synthetic expands what you can afford to test, and traditional validates what matters most.
The state of the category
Synthetic audiences moved from research curiosity to operational tool over the last two years. There are now multiple credible vendors in market, working with brand teams, political campaigns, and government clients. Peer-reviewed work has demonstrated 80%+ accuracy on social survey replication, and the leading platforms publish benchmarks against known ground truth.
The category is still young. The vendors that will matter long term are the ones that can demonstrate calibration against real data, are honest about their limitations, and treat synthetic audiences as a complement to human research rather than a replacement. The vendors that will not matter are the ones selling generic LLM wrappers and calling them digital twins.
More from the knowledge base.
What is a cohort?
What a cohort is across four research traditions, what AAPOR disclosure elements require, and why cohort precision determines synthetic audience fidelity.
Read article →What is a synthetic respondent?
The individual record in a synthetic study: how it differs from a persona, cohort, or audience; grounded vs prompted approaches; where it works and where it fails.
Read article →The human self-replication ceiling
The empirical ceiling on survey reliability, derived from test-retest research and Park et al. (2024). A reference for evaluating synthetic audience accuracy claims.
Read article →Put the platform in front of a real decision.
Bring a decision your team is working on. A research engineer will draft the cohort, the sample, and the study with you, in one working session. The methodology comes out with the result.