Fundamentals

What is an AI focus group?

Filed underFundamentals
Reading time8 min read
Last updatedMay 31, 2026

An AI focus group is a qualitative research format in which an AI system stands in for either the moderator or the participants of a traditional focus group. The term is used commercially for two different setups: AI moderators running parallel one-to-one interviews with real humans, and fully simulated discussions among LLM-generated personas. Both produce transcripts and themes, not statistical crosstabs.

The disambiguation matters. The category has no standards body and no agreed taxonomy. Vendor marketing uses “AI focus group,” “synthetic focus group,” “virtual focus group,” and “LLM focus group” interchangeably, even when the underlying methods are operationally different. Anyone evaluating these tools should be clear on which version is on the table before signing.

What is a traditional focus group, and why does the AI version exist?

The Nielsen Norman Group's working definition is the one most practitioners cite. A focus group is “a qualitative, attitudinal research method in which a facilitator conducts a meeting or workshop (typically about 1 to 2 hours long) with a group of 6 to 9 people to discuss issues and concerns about their experiences with a product or service.” The “focus” refers to the moderator's job of keeping the group on the assigned topics.

The format is widely used, and widely criticized. NN/G itself warns against using focus groups for usability evaluation, design requirements, or quantifying satisfaction. Group dynamics, dominant personalities, groupthink, memory bias, and the peak-end rule all warp the data. As one NN/G author puts it, a poorly run focus group can be a way to pay nine people for the opinions of three.

AI focus groups emerged in response to two specific costs: scheduling overhead and moderator scarcity. The KU Leuven team behind the first peer-reviewed academic system, Focus Agent, framed it cleanly. Organizing a focus group, they write, “presents two primary challenges: first, gathering so many people at the same time is not an easy task. Second, the success of a focus group relies on an experienced moderator with domain-specific expertise.” LLMs offered, in principle, a way to compress both.

What are the two formats sold as “AI focus groups”?

The split runs along one question: are the participants real?

AI-moderated interviews with real humans. The AI is the moderator. The respondents are recruited people. Vendors including Strella, Perspective AI, Outset (in its real-respondent mode), Listen Labs, and Voiceform run conversations either asynchronously or live, with the AI asking follow-up questions in real time based on what the participant says. The format is typically described as “one-to-many qualitative research where an AI agent moderates conversations with many real customers in parallel, asynchronously.” Each interview is a private one-to-one session, so there is no group dynamic at all, despite the “focus group” label.

Synthetic focus groups using LLM-generated personas. No real participants are involved. The system generates a panel of AI personas defined by demographic, behavioral, and attitudinal parameters, then simulates a multi-party discussion among them. SYMAR (formerly OpinioAI) markets a product called “Synthetic Focus Groups” in which personas “listen to what others say, react to opinions, and build upon or challenge ideas.” Synthetic Users positions itself similarly, though it explicitly calls itself a “discovery co-pilot, not a replacement for real research.” Yabble's “Virtual Audiences” product fits the same shape without using the focus-group label.

Some platforms operate in both modes. Outset, founded in 2022, is the most-cited example.

The Perspective AI taxonomy, useful but vendor-originated, sorts the field into three lanes: synthetic-respondent simulation, async AI-moderated with real respondents, and live AI-moderated with real respondents. It is one framing among several, not an industry consensus.

How is an AI focus group structured?

The structure differs by format.

Real-respondent AI moderation runs in three stages: moderation, probing, and synthesis. Participants join on their own schedule. The AI conducts the interview, adapting follow-up questions to each answer. After fielding closes, the system clusters transcripts, extracts themes, and produces a synthesis that approximates what a human moderator would write after a debrief. Typical scale is dozens to hundreds of conversations in a single study, compared to the eight to thirty total participants in one to four traditional sessions.

Synthetic persona discussions require the researcher to define participant parameters: names, ages, occupations, nationalities, and personalities. The LLM is then assigned roles and simulates a multi-party conversation. The KU Leuven Focus Agent paper describes exactly this construction, and notes the practical limits its team observed in longer sessions, including “repetitive opinions and the generation of irrelevant content,” hallucination, limited token memory, and loss of conversational continuity.

What can an AI focus group do that a traditional one cannot?

Three things, in both formats.

Speed. A traditional focus group runs six to eight weeks from brief to debrief, with two to four weeks of recruitment alone. AI-moderated and synthetic studies typically return results in hours.

Cost. A traditional focus group runs $8,000 to $15,000 per session covering recruitment, facility hire, moderator fees, incentives, and analysis. Synthetic studies have been benchmarked as low as $50 to $500 per study, though scope varies widely.

Parallelism. The AI does not have to be in one room with one group at one time. Studies that would require dozens of sessions of a human moderator can be run in a single fielding window.

The real-respondent format also removes one specific weakness of in-person groups: the dominant-voice problem. When every participant is interviewed independently by an AI, there is no groupthink, because there is no group. That is also why the “focus group” label is partly a metaphor in this case.

Where does the AI focus group format break down?

The synthetic variant has the harder set of limitations, and the published critique is increasingly direct.

The most rigorous study to date is Kapania et al., presented at CHI 2025. The Carnegie Mellon team interviewed 19 experienced qualitative researchers after exposing them to LLM-generated interview data. The researchers identified six fundamental limitations: responses lack palpability; the model's epistemic position is ambiguous; the practice heightens researcher positionality; it forecloses participants' consent and agency; it facilitates erasure of communities' perspectives; and it risks delegitimizing qualitative ways of knowing. Most of the 19 advised against using LLMs as the primary source of qualitative data.

The KU Leuven Focus Agent study, the first peer-reviewed academic build of a virtual focus group system, reported a mixed result. With 23 human participants across five discussion groups on digital well-being, the team found the Focus Agent “can generate opinions similar to those of human participants.” But the agent's performance as a moderator with real humans was constrained. The authors note it “has not demonstrated sufficient understanding of human conversation.”

Two structural problems sit underneath the empirical findings. The first is sycophancy. A 2026 paper in AI and Ethics argues that LLMs learn to give agreeable answers through RLHF because “belief-affirming responses consistently receive the highest human preference scores,” and that this is “fundamentally a human bias problem, not just a technical one.” For a research method whose value comes from surfacing disconfirming evidence, a respondent biased toward agreement is exactly the wrong instrument. The same paper notes that prompting “may only mask AIs' sycophantic tendencies, rather than truly overcome them.”

The second is novelty. LLMs are trained on historical data. They can interpolate within the space of what already exists, but they cannot react to something genuinely new. Faced with a concept that has no precedent in training data, a synthetic persona will extrapolate from the nearest adjacent category. The output will be plausible. It will not be authentic in the way a confused, excited, or revolted human participant would be.

Where does an AI focus group sit next to a quantitative synthetic audience?

This is the distinction most often missed. An AI focus group, in either flavor, is a qualitative instrument. The output is transcripts, themes, and quotes. The unit of analysis is the conversation, not the cell of a crosstab.

A quantitative synthetic audience is a different instrument entirely. Its output is response distributions across cohort segments at population scale, with sample sizes, confidence intervals, and crosstabs. It answers questions like “what share of likely voters in three swing states prefer message A over message B, and by how much.” An AI focus group, real-respondent or synthetic, cannot answer that question, because it was never built to. The two formats are complements, not substitutes.

For a deeper look at how Replism approaches quantitative synthetic research, the platform page covers the methodology in full.

When should you use which?

Four honest defaults.

A real human focus group. When the value is in the room: cocreation with sponsored customers, early discovery of mental models, surfacing the unexpected comment that nobody on the team would have thought of. The classic NN/G advice still applies.

An AI-moderated focus group with real participants. When the research question is qualitative and the depth of an interview matters, but the scheduling and recruitment cost of traditional fielding is prohibitive. Best when scaled to dozens or hundreds of interviews you could not otherwise afford to run.

A synthetic focus group with LLM personas. Best treated as a discovery co-pilot, in the language Synthetic Users uses for its own product. Useful for stress-testing study design before fielding with real respondents, sanity-checking a hypothesis, or accessing audiences that are otherwise hard to reach. Not appropriate as the primary evidence in a regulatory submission, a legal proceeding, or a launch decision against a genuinely novel category.

A quantitative synthetic audience. When the question is statistical: how does message A test against message B across cohorts, what is the likely-voter distribution by geography and party lean, how does a price point land in a given segment. Synthetic audiences produce the crosstabs and confidence intervals that survive a board meeting or a war room. They are not a replacement for qualitative depth, and they are not the same instrument as an AI focus group.

References

Put the platform in front of a real decision.

Bring a decision your team is working on. A research engineer will draft the cohort, the sample, and the study with you, in one working session. The methodology comes out with the result.