Industry report

The State of Synthetic Audiences 2026

Reading time15 min read

PublishedJune 21, 2026

Last updatedJune 21, 2026

Synthetic audiences crossed from research curiosity to funded category in 2026. The academic foundation is real: interview-grounded agents now replicate survey answers about as consistently as people replicate their own. The commercial momentum is louder still, with one vendor raising $100 million and another priced near a billion. But the independent evidence is sobering. The most comprehensive study to date found weak individual-level accuracy and pervasive under-dispersion, and the profession's standards bodies have settled on a single word for where this belongs today: augment, not replace. ¹⁵⁸

This report maps the category as it stands in mid-2026: where it came from, who is building it, what the published evidence actually supports, where it breaks, and what the standards bodies are asking for next. Vendor accuracy claims are presented as vendor claims. Independent benchmarks are presented separately. The gap between them is the most important fact in the category, and this report does not flatten it.

What is a synthetic audience, and what is it not?

A synthetic audience is an AI-generated population of respondents built to behave, in aggregate, like a real one. You define the population you want, put questions to it, and read the distribution of answers back. The credible versions are grounded in real human data and benchmarked against known results. The term has a settled commercial taxonomy: a synthetic audience is the category, a synthetic persona is a single segment you can interrogate one to one, and a synthetic focus group is multiple personas responding at once to the same stimulus. ¹²

It helps to separate the terms that get used interchangeably and should not be. Silicon sampling is the academic method of conditioning a language model on real sociodemographic backstories so it emulates subgroup response distributions. Argyle et al. coined it, alongside the term algorithmic fidelity, in their 2023 paper in Political Analysis. ⁸ Generative agents are the architectural idea: software agents that extend a language model with memory, reflection, and retrieval to produce persistent, believable behavior, introduced by Park et al. at ACM UIST 2023. ² Homo silicus is the economics framing, proposed by Horton, Filippas, and Manning at NBER in 2023: language models as implicit computational models of humans, usable the way economists use Homo economicus. ⁴ A Turing Experiment, from Aher, Arriaga, and Kalai at ICML 2023, tests whether a model can simulate a representative sample of study participants rather than a single individual. ⁵

One distinction matters more than the rest. Synthetic data and a synthetic audience are not the same thing. Synthetic data is artificially generated information designed to mimic a real dataset, often for privacy or model training. A synthetic audience uses real human data as its foundation and makes it queryable. As GWI puts it, the audience is synthetic, but the data is real. ¹² Agent-based modeling, which predates language models by decades, is a further cousin: modern generative-agent work fuses those older simulation concepts with large models.

How did the category get here?

The intellectual groundwork was laid between 2022 and 2024 by a small number of papers that are now cited thousands of times. Argyle et al. showed that GPT-3, properly conditioned on real backstories, could emulate response distributions from specific human subgroups, and named the property algorithmic fidelity. ⁸ The paper did not report a single headline accuracy figure. It demonstrated fidelity through several criteria, including an indistinguishability test in which participants guessed 61.7% of human-generated lists were human and 61.2% of GPT-3 lists were human, a difference that was not statistically significant. ⁸ Writers and vendors who cite this paper for a clean accuracy percentage are reading something into it that is not there.

The next leap was empirical scale. In November 2024, Park et al. built generative agents for 1,052 real Americans, recruited by stratified sampling and each interviewed for two hours by an AI interviewer. ³ The result became the most-quoted number in the field: the agents replicated participants' General Social Survey responses 85% as accurately as the participants replicated their own answers two weeks later. ³ The denominator matters. This is 85% of the human test-retest ceiling, not 85% raw agreement. The same study reported a Big Five personality correlation of r = 0.80 and an economic-game correlation of r = 0.66, and found that interview-grounded agents were less racially and ideologically biased than agents given only demographic labels. ²

Skepticism arrived in the same venue. Bisbee, Clinton, Dorff, Kenkel, and Larson published a direct challenge in Political Analysis in 2024. ⁶ Prompting ChatGPT to adopt personas and rate sociopolitical groups, they found the average scores matched the 2016 to 2020 American National Election Study closely, but that the model was not reliable for statistical inference: responses showed less variation than real surveys, and regression coefficients often diverged from ANES estimates. They also documented that minor changes in prompt wording shifted the distribution, and that the same prompt produced significantly different results three months apart because the underlying model had changed. ⁶ The political scientist G. Elliott Morris cites this paper, attributing it to Clinton and Larson, as a load-bearing reason his aggregator excludes synthetic data. ⁴³

The commercial category formed alongside the research. Fairgen, founded in Tel Aviv in 2021, pivoted to synthetic survey augmentation and launched out of beta in May 2024 with $8 million raised. ¹⁴ Qualtrics had already committed $500 million over four years to generative AI for its platform. ¹⁴ By the May to June 2025 issue of Harvard Business Review, the topic had crossed into mainstream business readership, framed there as the most exciting frontier for generative AI in marketing. ¹³

Who is building synthetic audiences in 2026?

The 2026 landscape is a mix of venture-funded startups, traditional research houses adding AI, and platform incumbents. Funding has been the loudest signal.

Simile

Palo Alto · Series A · $100M

A Palo Alto startup co-founded by Joon Sung Park, Michael Bernstein, Percy Liang, and Lainie Yallen, all with Stanford ties, Simile raised a $100 million Series A led by Index Ventures, announced in February 2026. ¹⁵¹⁶ Backers include Bain Capital Ventures, Fei-Fei Li, and Andrej Karpathy. ¹⁶ The team had spent roughly seven months building its model on interviews with hundreds of people, transaction records, and behavioral-science journals. Named customers include CVS Health, which uses it for simulated focus groups, Gallup, which uses it for digital polling panels, and Telstra. ¹⁷¹⁸ Simile is the commercial vehicle closest to the foundational academic work; its founders led the 1,000-people study.

Vendor claim

85% of participants' GSS accuracy reproduced per independent testing; CEO claims model can forecast 8 of 10 analyst questions on an earnings call. ¹⁸¹⁷

Aaru

New York · Series A · ~$1B headline

Founded in March 2024 in New York, Aaru raised a Series A led by Redpoint Ventures at a reported $1 billion “headline” valuation, with a blended valuation below that figure and a round size above $50 million; annual recurring revenue was below $10 million as of December 2025. ¹⁹ Aaru's named partners include Accenture, EY, and Interpublic Group, and it works with political campaigns. ¹⁹²² Its private-sector product is branded Lumen. ²² Aaru is also the clearest cautionary tale in the category. Semafor reported it predicted the 2024 New York Democratic primary within 371 votes, then reported after the 2024 presidential election that Aaru, like most human pollsters, got its top-line prediction wrong, having forecast a Harris Electoral College win. ²⁰²¹ Its CEO argued the result was “within the margin of error.” ²¹

Vendor claim

Predicted the 2024 New York Democratic primary within 371 votes; top-line 2024 presidential prediction was incorrect. ²⁰²¹

Evidenza

B2B personas · $1M+ first-year revenue

Co-founded by Peter Weinberg and Jon Lombardo, both formerly of LinkedIn's B2B Institute, Evidenza launched out of stealth in June 2024 and reported revenue over $1 million within its first year. ³⁰³² It focuses on synthetic B2B audiences and executive personas. Its homepage advertises “88% accuracy in 100+ validations.” ²⁹ In June 2025 it announced a partnership with Dentsu, which reported early results showing an 0.87 correlation with traditional research. ³³

Vendor claim

“88% accuracy in 100+ validations” defined as average match between synthetic and traditional research; metric and sample size not disclosed. ²⁹

Artificial Societies

London · YC W25 · ~$5.35M

A London startup founded in October 2024 by James He and Patrick Sharpe, and a Y Combinator W25 company, Artificial Societies raised roughly $5.35 million across pre-seed and seed rounds, the seed led by Point72 Ventures. ²⁴²⁵ Its academic grounding is real: He led a study of 33,299 AI chatbots published in the British Journal of Psychology. ²⁶ The platform simulates networks of AI personas for strategic communications, and its named case study is a Teneo project that simulated 189,756 perspectives for a Fortune 100 client. ²⁸ A note on verifiability: the company's direct site was inaccessible during this research, and its headline accuracy figures reach us only through a third-party trade article. They are discussed in the methodology section and treated accordingly. ²⁷

Vendor claim

~86% distribution accuracy vs. 1,000 UC Berkeley surveys; ~95% accuracy against human self-replication; figures reach us only via a third-party trade article. ²⁷

Synthetic Users

Lisbon · LIFT Labs · per-interview

Founded in 2023 by Kwame Ferreira and Hugo Alves and based in Lisbon, Synthetic Users is a Comcast NBCUniversal LIFT Labs portfolio company focused on qualitative and UX research. ³⁴³⁶ It reports a self-defined “Synthetic Organic Parity” of 85 to 92%, uses a per-interview pricing model, and advertises SOC 2 compliance, a vendor-specific fact about that company. ³⁴³⁶

Vendor claim

85 to 92% “Synthetic Organic Parity” across thematic overlap, depth, comprehensiveness, and qualitative alignment; single published case study of eight interviews. ³⁵

The incumbents are moving too. Ipsos partnered with Stanford's Politics and Social Change Lab in June 2025 to build and validate digital-twin panels grounded in its KnowledgePanel. ⁴¹ Gallup began independently validating Simile's method, with interviews of roughly 1,000 panel members starting in fall 2025. ⁴⁰ Qualtrics launched synthetic panels for US consumers in March 2026, claiming 12 times better accuracy than general-purpose AI and combining synthetic and human panels on one platform. ⁴²

The demand signals are equally clear, if drawn from market participants. A Qualtrics survey of more than 3,000 researchers across 14 countries found 71% agree the majority of market research will be done with synthetic responses within three years, and that 89% already use AI tools regularly or experimentally; Qualtrics is a vendor with a stake in that finding. ³⁷ EY published its own case study reporting it recreated its 2025 Global Wealth Research Report in a single day using Aaru, with a median correlation of 90%, against fieldwork that normally takes six months. ²³ Greenbook's 2026 GRIT report described synthetic data crossing “from niche topic to top-three industry buzz in a single wave.” ³⁸ The analyst houses have put numbers to the trajectory, though their primary reports sit behind paywalls and reach this report only through secondary citation. Gartner's “Predicts 2025: Digital Twins of Customers” is cited as forecasting that by 2027, 40% of large enterprises will have integrated customer digital twins into their insights processes. ²⁶ Forrester is cited as predicting that by 2027, synthetic data will replace at least 20% of the real consumer data used for predictive analysis in market research. ²⁷ Both figures should be read as analyst forecasts relayed secondhand, not as audited measurements.

What does the published evidence actually say?

This is where the category's marketing and its science diverge, and the honest answer requires holding two columns side by side.

The vendor self-reported figures. These are claims made by companies about their own products. Metric definitions and sample sizes are mostly undisclosed, and several could not be independently verified.

Artificial Societies: a set of self-reported figures relayed secondhand. The vendor's own site was inaccessible during this research, so the numbers trace to a third-party trade article rather than to any primary validation document. As reported there, the platform claims roughly 93% response consistency and 86% distribution accuracy against 1,000 real surveys sourced from UC Berkeley research, which the vendor frames as within five points of a stated 91% human-replication ceiling, while standard or baseline large language models in the same comparison reached only 61 to 67% distribution accuracy. The same trade coverage relays an “approximately 95% accuracy against human self-replication” headline. None of these figures were accompanied by a disclosed sample-size breakdown or an independent audit, and the contrast with the independent benchmarks below is the spine of this report. ²⁷
Evidenza: 88% accuracy across 100+ validations, defined on its homepage as “average match between synthetic and traditional research”; the statistical metric, the sample size per validation, and the methodology are not publicly disclosed, and the validations were run “with clients,” not as independent audits. ²⁹
Synthetic Users: 85 to 92% “Synthetic Organic Parity,” a proprietary composite score weighting thematic overlap, depth, comprehensiveness, and qualitative alignment; the single published case study compared eight organic interviews against synthetic ones. ³⁵
Simile: an 85% figure described as reproducing survey responses at 85% of the accuracy of the humans modeled, attributed to “independent testing” by a trade publication and consistent with the Stanford research lineage, but not confirmed as an independent audit. ¹⁸ Separately, its CEO has claimed the model can forecast eight of ten analyst questions on an earnings call. ¹⁷¹⁸

A few claims sit one notch higher because the number comes from a named customer rather than the vendor. EY's CMO of the Americas described one Evidenza brand-survey comparison as “95% correlation,” a quote that appears in trade press and EY-adjacent materials; it is a client characterization, not a published study, and no sample size is reported. ³⁰³¹ EY's own 90% median-correlation figure for the Aaru wealth-research replication is reported on EY's own website, which gives it more weight than a pure vendor claim, though it remains a single internal exercise. ²³ BCG reports that synthetic panels predicted real consumer beverage choices with 92% accuracy in a conjoint study, with fine-tuning over time; this is BCG's internal client work, single-sourced, not independently replicated. ³⁹

The independent benchmarks. These are the figures the category should be held to, and they are more modest.

85%

of human test-retest ceiling

Park et al. (2024) ³

r = 0.20

Individual-level correlation

Peng et al. ¹¹

93.9%

Outcomes under-dispersed

Peng et al., 164 outcomes ¹¹

Pre-registered studies

Peng et al., 2,058 participants ¹¹

The Park et al. 1,000-people study remains the strongest positive result, and even it is carefully bounded: 85% of the human test-retest ceiling on the General Social Survey, not 85% raw accuracy. ³ That is the high-water mark from interview-grounded agents on a large representative sample.

The most comprehensive critical study is Peng et al., “Digital Twins as Funhouse Mirrors,” a multi-institution effort with 19 pre-registered studies, 2,058 US participants, and 164 outcomes, with code and data publicly released. ¹¹ Its findings are blunt. Digital-twin predictions were “only modestly more accurate than those of a homogeneous base LLM” and showed weak individual-level correlation with human responses, an average r = 0.20. ¹¹ Twin responses were under-dispersed, with lower standard deviation than human responses, in 154 of 164 cases, or 93.9%, and the under-dispersion was statistically significant in 146 of those. ¹¹ Accuracy was systematically higher for more educated, higher-income, and ideologically moderate participants, mirroring the demographics overrepresented in model training data. ¹¹ The authors caution against premature deployment. ¹¹

Set the Artificial Societies figures against this and the gap is the whole point. The vendor reports 86% distribution accuracy and a 91% human ceiling; the strongest independent result reaches 85% of a test-retest ceiling on aggregate distributions, while the most comprehensive independent study reports r = 0.20 at the individual level and under-dispersion in 93.9% of outcomes. Self-reported aggregate accuracy and independently measured individual-level accuracy are not the same quantity, and a vendor number relayed through a trade article without a disclosed method is not comparable to a pre-registered study with public code and data.

Vendor self-reported

Independent benchmark

Vendor self-reported

Artificial Societies: ~86% distribution accuracy against 1,000 UC Berkeley surveys, within five points of a stated 91% human-replication ceiling; ~95% accuracy against human self-replication. ²⁷

Independent benchmark

Peng et al.: average r = 0.20 individual-level correlation across 19 pre-registered studies, 2,058 US participants, 164 outcomes. Under-dispersion in 93.9% of outcomes. ¹¹¹¹

Vendor self-reported

Park et al. (2024): interview-grounded agents replicated General Social Survey responses at 85% of the human test-retest ceiling. Big Five personality r = 0.80. ³

Independent benchmark

Park et al. is the strongest independent positive result and applies to aggregate distributions on a large representative sample. 85% is 85% of the ceiling, not 85% raw agreement. Individual-level correlation and under-dispersion remain separate concerns. ³

Vendor self-reported

Evidenza: “88% accuracy in 100+ validations” defined as average match between synthetic and traditional research; metric and sample size not disclosed. ²⁹

Independent benchmark

Dentsu partnership reported 0.87 correlation with traditional research in early results; EY CMO of the Americas cited 95% correlation in a single brand-survey comparison. Both are client characterizations, not published studies. ³³³⁰

Vendor self-reported

Synthetic Users: 85 to 92% “Synthetic Organic Parity,” a proprietary composite score; single published case study of eight organic interviews. ³⁵

Independent benchmark

Santurkar et al.: substantial misalignment between LLM opinion distributions and 60 US demographic groups, on par with the Democrat-Republican divide on climate change; RLHF tuning made misalignment worse. ⁹

Vendor self-reported

BCG: synthetic panels predicted real consumer beverage choices with 92% accuracy in a conjoint study. ³⁹

Independent benchmark

BCG's own caution: training data “may be outdated or inaccurate” and synthetic respondents can infer researchers' hypotheses and produce confirming data. Single-sourced internal client work, not independently replicated. ³⁹

The supporting critical literature is consistent. Santurkar et al. found substantial misalignment between language-model opinion distributions and 60 US demographic groups, on par with the Democrat-Republican divide on climate change, and found that this misalignment persisted even when models were explicitly steered toward groups; RLHF tuning made it worse, leaving older, widowed, and Mormon respondents poorly represented. ⁹ Tjuatja et al., across nine models, found that commercial LLMs generally fail to reproduce human response biases from survey design and are instead sensitive to perturbations that humans ignore. ¹⁰ Aher et al. documented a “hyper-accuracy distortion,” in which models give near-perfect answers to obscure factual questions where humans produce a spread, eliminating the variance some experiments depend on. ⁵ A February 2026 preprint, “Do LLMs Track Public Opinion?,” cited by Morris, reported “systematic directional miscalibration,” with every model tested overpredicting Kamala Harris's favorability against high-quality 2024 polls. ⁴⁴

Read together, the picture is specific. Calibrated, interview-grounded systems can approach the human ceiling on aggregate distributions for well-trodden topics. The same systems remain weak at individual-level prediction, compress variance almost everywhere, and degrade for underrepresented groups and novel events. A vendor accuracy headline that does not name its metric, its sample size, and its benchmark is not comparable to any of these results.

Where do synthetic audiences break?

The failure modes are now documented well enough to enumerate, and they recur across independent studies.

Distributional flattening, or under-dispersion. Synthetic respondents cluster around an answer more tightly than humans do. Bisbee et al. found less variation than real surveys; Peng et al. found under-dispersion in 93.9% of outcomes. ⁶¹¹

Generating 50,000 synthetic respondents instead of 500 “does not create new information about what the public thinks. It just produces more draws from the same underlying model.”
G. Elliott Morris, FiftyPlusOne ⁴³

Prompt sensitivity and temporal instability. The same prompt, reworded slightly, shifts the distribution; the same prompt, rerun months later, shifts again as the model changes underneath. Bisbee et al. documented both, calling the latter a threat to the reproducibility norms of social science. ⁶

Demographic and ideological bias. Accuracy is uneven across groups. Park et al. reduced this bias with interview grounding but did not eliminate it; Peng et al. and Santurkar et al. both found systematically worse performance for groups underrepresented in training data. ²¹¹⁹

Novel events and knowledge cutoff. Models reason poorly about what postdates their training. Gallup has named “how quickly agent predictions deteriorate over time as the world changes” as a core research question, and BCG warns that training data “may be outdated or inaccurate.” ⁴⁰³⁹

Hypothesis confirmation. BCG cites a study finding that synthetic respondents “can sometimes infer the researchers' hypothesis and produce data that artificially confirms it.” ³⁹ AAPOR's review notes a related risk: a synthetic respondent can be instructed to maliciously alter polling outcomes while passing standard attention checks. ¹

There is also a structural objection that no amount of fine-tuning resolves. As the User Interviews report puts it, language models are “predictors of plausible text, not embodied beings with sensory experience, memory-as-personal-history, or lived constraints.” ⁴⁵ That is an argument about what is being measured, not how accurately.

What comes next for synthetic audiences?

The standards bodies and the most credible institutions have converged on a posture, and it is narrower than the funding headlines suggest: augment, do not replace.

AAPOR's May 2026 task force report, authored by representatives from Microsoft Research, Gallup, Pew, NORC, Meta, Duke, and the Census Bureau, states that AI usage is “more directed at augmentation than full automating” and that synthetic respondents “pose particularly serious validity and disclosure risks if applied beyond clearly labeled pretesting, pilot work, or exploratory diagnostics.” ¹¹ AAPOR's required AI disclosures have been submitted as proposed revisions to its code, pending a membership vote. ¹

Gallup, with a 90-year history, drew the line explicitly: simulated responses “will not be used to produce Gallup's published population estimates,” the two are “not interchangeable,” and “we will never present simulated results using language that implies direct human measurement.” ⁴⁰ BCG offers a practical tiering: synthetic panels as a primary tool for low-risk, high-iteration decisions, as support for medium-risk decisions, and subordinate to human testing for high-risk decisions like regulated claims or forecasting. ³⁹ Among 150 researchers surveyed by User Interviews, 47% described themselves as skeptical and wanting more evidence, and 62.7% said their organizations have no guidance at all on synthetic-user use. ⁴⁵

The standards picture is filling in. ESOMAR published buyer guidance for augmented synthetic data in September 2025, and the 2025 ICC/ESOMAR Code introduced an official definition of synthetic data with requirements on transparency, disclosure, and validation. ⁴⁶ No ISO-level international standard for synthetic audiences exists yet; the frameworks are industry self-regulatory.

If there is a single thesis emerging about which vendors will matter, it is about defensibility rather than plausibility. As the investor Insight Innovation Ventures put it, “the question ‘are synthetic respondents valid?’ is too crude. The better question is: valid for what decision, grounded in what data, tested against what benchmark, refreshed how often, and governed by what boundary conditions?” Their summary line is the one to remember: ⁴⁷

“The first pitch for synthetic sample is speed. The durable pitch will be trust.”
Insight Innovation Ventures ⁴⁷

How Replism approaches this

Those five questions are the rubric Replism is built to answer.

Valid for what decision. Decisions that have to survive scrutiny, not just be made quickly: pressure-testing a product, a message, or a strategy before budget is committed or anything goes public.

Grounded in what data. Personas built from real response data, not prompted into existence.

Tested against what benchmark. Human self-replication, with public eval reports. Held to the same vendor-self-reported standard applied to every competitor in this report and not independently audited, the stated internal results are 94.5% accuracy against human self-replication, with a persona pool above 1 million.

Refreshed how often. A quarterly evaluation cadence rather than a one-off annual PDF.

Governed by what boundary conditions. Cohorts defined per study rather than drawn from a stock library, methodology shipped with every study, the cohort definitions, sample sizes, and distributions, and coverage across verticals including business, political, and government.

The posture is simple to state and hard to fake: built to be defended, not believed.

Frequently asked questions

Is a synthetic audience the same as synthetic data?

No. Synthetic data is artificially generated information designed to mimic a real dataset, often for privacy or model training. A synthetic audience uses real human data as its foundation and makes that audience queryable. The audience is synthetic; the underlying data is real. ¹²

How accurate are synthetic audiences in 2026?

It depends on the benchmark. The strongest independent positive result, Park et al., found interview-grounded agents reached 85% of the human test-retest ceiling on the General Social Survey. The most comprehensive critical study, Peng et al., found weak individual-level correlation, an average r = 0.20, and under-dispersion in 93.9% of outcomes. Vendor figures of 85 to 95% are self-reported and rarely disclose their metric or sample size. ³¹¹¹¹

Can synthetic audiences replace traditional surveys and polling?

The standards bodies say no, not as a replacement. AAPOR frames AI as augmentation, Gallup will not use simulated responses for published population estimates, and BCG positions synthetic panels as subordinate to human testing for high-risk decisions. The consensus is augment, not replace. ¹⁴⁰³⁹

Why did an AI polling firm get the 2024 election wrong?

Aaru predicted a Harris Electoral College win and, like most human pollsters, got the top-line result wrong, after having predicted the 2024 New York Democratic primary within 371 votes. It reflects documented limitations: synthetic respondents struggle with novel events and compress the variance that real uncertainty produces. ²⁰²¹⁴³

What should a buyer ask a synthetic audience vendor?

Ask what human data grounds the system, what has been validated against which benchmark, what sample size the accuracy figure rests on, how often the model is refreshed, and what the system should not be used for. A headline accuracy percentage without a named metric and sample size is not comparable across vendors. ⁴⁷⁴⁶

Source notes

[1] AAPOR Task Force (Rothschild, Marlar, et al.) (2026). Responsible AI Integration in Survey Research . American Association for Public Opinion Research, May 2026. Cited for the augment-not-replace framing [8a], the high-risk characterization of synthetic respondents and the malicious-respondent and disclosure points [1f].

[2] Park, J. S., O’Brien, J., Cai, C. J., Morris, M. R., Liang, P., Bernstein, M. S. (2023). Generative Agents: Interactive Simulacra of Human Behavior . ACM UIST ’23. DOI: 10.1145/3586183.3606763. Cited for the definition of generative agents [2a] and, alongside the 2024 study, for bias reduction from interview grounding [2c].

[3] Park, J. S., Zou, C. Q., Shaw, A., Hill, B. M., Cai, C., Morris, M. R., Willer, R., Liang, P., Bernstein, M. S. (2024). Generative Agent Simulations of 1,000 People . arXiv:2411.10109. Cited for the 1,052-person interview-grounded method [3a] and the headline 85% normalized accuracy figure [3b].

[4] Horton, J. J., Filippas, A., Manning, B. S. (2023). Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? . NBER Working Paper No. 31122. Cited for the Homo silicus concept [4a].

[5] Aher, G. V., Arriaga, R. I., Kalai, A. T. (2023). Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies . ICML 2023, PMLR 202. arXiv:2208.10264. Cited for the Turing Experiment framing [5a] and the hyper-accuracy distortion [5a].

[6] Bisbee, J., Clinton, J. D., Dorff, C., Kenkel, B., Larson, J. M. (2024). Synthetic Replacements for Human Survey Data? The Perils of Large Language Models . Political Analysis, 32(4), 401-416. DOI: 10.1017/pan.2024.5. Cited for the means-match-but-inference-fails finding, prompt sensitivity, and temporal instability [6b]. Note: G. Elliott Morris attributes this paper to Clinton and Larson [6f]; the full author list is reconciled here.

[7] Park, J. S., et al., as covered by Stanford HAI. Katharine Miller (2025). AI Agents Simulate 1,052 Individuals’ Personalities with Impressive Accuracy . Stanford Human-Centered AI Institute. Corroborates the 1,000-people study and Joon Sung Park quotes.

[8] Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., Wingate, D. (2023). Out of One, Many: Using Language Models to Simulate Human Samples . Political Analysis, 31(3), 337-351. DOI: 10.1017/pan.2023.2. arXiv:2209.06899. Cited for silicon sampling and algorithmic fidelity [1a] and the indistinguishability test result [1b].

[9] Santurkar, S., Durmus, E., Ladhak, F., Lee, C., Liang, P., Hashimoto, T. (2023). Whose Opinions Do Language Models Reflect? . ICML 2023 (Oral). arXiv:2303.17548. Cited for demographic-group misalignment, persistence under steering, and RLHF amplification [6s].

[10] Tjuatja, L., Chen, V., Wu, S. T., Talwalkar, A., Neubig, G. (2023/2024). Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design . arXiv:2311.04076; EMNLP 2024. Cited for the finding that LLMs fail to reproduce human survey-design response biases [7s].

[11] Peng, T., et al. (2025/2026). Digital Twins as Funhouse Mirrors: Five Key Distortions . arXiv:2509.19088. Cited for the 19-study, 2,058-participant design [5m], r = 0.20 individual-level correlation [5b], 93.9% under-dispersion [5c], demographic accuracy skew [5d], and the caution against premature deployment [5e].

[12] GWI (2026). Synthetic Personas: The Complete Guide . GWI blog. Cited for the audience/persona/focus-group taxonomy and the synthetic-data-versus-synthetic-audience distinction [10a].

[13] HBR. Korst, J., Puntoni, S., Toubia, O. (2025). How Gen AI Is Transforming Market Research . Harvard Business Review, May-June 2025. Cited for the mainstream-business-press arrival of the topic [8o]. Note: soft-paywalled; only the abstract and opening were available.

[14] Sawers, P. (2024). Fairgen ‘boosts’ survey results using synthetic data and AI-generated responses . TechCrunch. Cited for Fairgen’s founding, pivot, and $8M raised, and for the Qualtrics $500M generative-AI commitment [9o].

[15] Index Ventures. Shah, S. (2026). Life, the Universe, and Simile: Leading Simile’s Series A . Cited for the $100M Series A and Simile’s founding team [12l].

[16] Prabhu, A. (2026). $100M for Stanford spinout Simile: AI that simulates human decisions . Tech Funding News. Cited for the investor list, founders, and seven-month build [14l].

[17] Tremayne-Pengelly, A. (2026). Fei-Fei Li and Andrej Karpathy’s New A.I. Bet: Simulating Society . Observer. Cited for CVS, Gallup, and Telstra customers and the earnings-call prediction claim [13l].

[18] Deutscher, M. (2026). AI digital twin startup Simile raises $100M in funding . SiliconANGLE. Cited for the Bloomberg-attributed 8-of-10 earnings-call claim and the 85% “independent testing” figure relayed via trade press [15l].

[19] Temkin, M. (2025). Sources: AI synthetic research startup Aaru raised a Series A at a $1B ‘headline’ valuation . TechCrunch. Cited for Aaru’s valuation structure, round size, ARR, founders, and named partners [7l].

[20] Albergotti, R. (2024). AI startup Aaru uses chatbots instead of humans for political polls . Semafor. Cited for the 371-vote NY primary prediction and Aaru’s methodology [8l].

[21] Albergotti, R. (2024). AI polling company defends wrong predictions on the US election . Semafor. Cited for Aaru’s incorrect 2024 presidential prediction and the “within the margin of error” quote [9l].

[22] McQuater, K. (2025). Accenture invests in synthetic audience startup Aaru . Research Live. Cited for the Accenture investment and the Lumen product name [10l].

[23] Munshi, S. / EY (2025). How AI simulation accelerates growth in wealth and asset management . EY. Cited for EY’s own 90% median-correlation Aaru replication [11l].

[24] EU-Startups. Cendon Garcia, D. (2025). British AI startup Artificial Societies raises €4.5 million to simulate human behaviour at scale . Cited for Artificial Societies’ founding, founders, and funding [1l].

[25] Chesnokova, S. (2025). Startup in spotlight: Artificial Societies lets anyone run AI-powered simulations of human societies . Tech Funding News. Cited for the seed/pre-seed structure and the 33,299-chatbot study reference [2l].

[26] He, J. K., Wallis, F. P. S., Gvirtz, A., Rathje, S. (2026). Artificial intelligence chatbots mimic human collective behaviour . British Journal of Psychology, 117(2), 761-776. DOI: 10.1111/bjop.12764. Cited for the academic grounding of Artificial Societies [3l]. Separately, Gartner's “Predicts 2025: Digital Twins of Customers” is cited as forecasting that 40% of large enterprises will have integrated customer digital twins into their insights processes by 2027 [26l]; the Gartner primary report is paywalled and reaches this report only as a secondary citation via jonathanmall.com / Gartner.eu commentary.

[27] Artificial Societies, via secondary trade article (2026). How AI Simulations Predict Stakeholder Responses for Fortune 100 Decisions . High-Tech Mag. Cited as the secondhand source for Artificial Societies' self-reported figures: ~95% accuracy against human self-replication, 93% response consistency, 86% distribution accuracy against 1,000 UC Berkeley surveys (within five points of a stated 91% human-replication ceiling), and 61 to 67% distribution accuracy for standard or baseline LLMs in the same comparison; the vendor site was inaccessible and no primary validation document was located, and n and methodology are not disclosed [12m]. Separately, Forrester's 2024 research report is cited as predicting that by 2027 synthetic data will replace at least 20% of the real consumer data used for predictive analysis in market research [27l]; the Forrester primary report is paywalled and reaches this report only as a secondary citation via IDSurvey ESOMAR commentary.

[28] Artificial Societies (undated). Teneo Case Study . Cited for the Teneo project and the 189,756-perspectives figure [5l].

[29] Evidenza (undated). Evidenza homepage . Cited for the “88% accuracy in 100+ validations,” “6 months to 6 hours,” and customer-logo claims [16l], and for the metric-undisclosed caveat and the “average match between synthetic and traditional research” definition [9m].

[30] Adweek staff (2024). Can AI Replace Humans for Market Research? This Firm Is Doing It . Adweek. Cited for Evidenza’s first-year revenue and the EY CMO “95% correlation” quote [17l].

[31] OfficeStrategix (2024). Synthetic data is as good as real . Cited as the secondary relay of the EY CMO “95% correlation” quote; author discloses a minor Evidenza shareholding [10m].

[32] The Drum (2025). Lab-grown marketing? It’s already here and it’s synthetic, scalable and very real . Cited for Evidenza’s founders and stealth launch [19l].

[33] Dentsu (2025). Dentsu Partners with Evidenza to Integrate Synthetic Audiences into Next Gen Media Planning . Cited for the 0.87-correlation early result [20l].

[34] LIFT Labs. Spears, A. (2025). Smart Insights, Less Friction: Synthetic Users Is Simplifying Research with AI Personas . Cited for Synthetic Users’ founders, LIFT Labs backing, and the 85-92% parity claim [21l].

[35] Synthetic Users (2024). How we measure success . Cited for the “Synthetic Organic Parity” definition and the eight-interview case study [11m].

[36] AI CMO (2026). Synthetic Users Review 2026 . Cited for Synthetic Users’ founding details, SOC 2 status, and pricing [23l].

[37] Qualtrics (2024). AI to drive massive changes to market research in 2025, Qualtrics report says . Cited for the 71% / 89% adoption figures, n = 3,000+ across 14 countries [24l].

[38] Greenbook (2026). 2026 GRIT Insights Practice Report . Cited for the “niche topic to top-three industry buzz” quote [25l].

[39] BCG. Martinez, J., Kropp, M., Millwater, E., Lee, A. (2026). Want Consumer Insights Faster? AI Can Help. . Cited for the 92% conjoint accuracy figure, the decision-tiering framework, and the training-data, minority-view, and hypothesis-confirmation risks [5f].

[40] Gallup. Marlar, J., Ritter, Z. (2026). Gallup Begins Research on Simulated Responses . Cited for the non-substitution and disclosure policy and the temporal-decay research question [2f].

[41] Ipsos (2025). Ipsos Partners with Stanford University to Pioneer the Future of Market Research with Synthetic Data . Cited for the Ipsos-Stanford PASCL KnowledgePanel partnership [3f].

[42] Qualtrics (2026). Qualtrics Announces New Market Research Capabilities at X4 2026 . Cited for the March 2026 synthetic-panel launch and the 12x-accuracy claim [4f].

[43] Morris, G. E. (2026). Why 50+1 isn’t collecting ‘synthetic polls’ . FiftyPlusOne. Cited for the “more draws from the same model” argument, the Clinton & Larson attribution, and the “Do LLMs Track Public Opinion?” reference [6f].

[44] “Do LLMs Track Public Opinion?” (2026). arXiv:2602.06302 . Cited via Morris’s paraphrase for “systematic directional miscalibration” and the Harris-favorability overprediction [14u]. Note: primary text not independently retrieved; quotes are Morris’s paraphrase.

[45] User Interviews (2026). The State of Synthetic Users . Cited for the 47%-skeptical and 62.7%-no-guidance figures (n = 150) and the “predictors of plausible text” framing [12u].

[46] ESOMAR (2025). 5 Topics of Discussion to Help Buyers of Augmented Synthetic Data for Market Research and Insights . Cited for ESOMAR buyer guidance and the 2025 ICC/ESOMAR Code definition of synthetic data [9f].

[47] Insight Innovation Ventures (2026). Synthetic Sample Is Not the Market. Decision-Grade Data Is. . Cited for the “valid for what decision” reframing and the “first pitch is speed, durable pitch is trust” line [8f].

Put the platform in front of a real decision.

Bring a decision your team is working on. A research engineer will draft the cohort, the sample, and the study with you, in one working session. The methodology comes out with the result.