Fundamentals

What is a cohort?

Filed underFundamentals
Reading time9 min read
Last updatedMay 23, 2026

A cohort is a defined group of people who share a common, time-anchored characteristic that the researcher specifies before the study begins. In survey research and the social sciences, that characteristic is usually a demographic attribute, a behavior, or an attitude. The cohort is the answer to the question “who, exactly, are we measuring?” Everything that follows in a study, from the sample frame to the margin of error, depends on getting that answer in writing.

The word travels across four fields that use it to mean related but not identical things. Naming the four uses up front is the most useful thing this page can do. [1][2][3][4]

Definition. A cohort is a group of individuals who share a defining time-anchored characteristic, identified before data collection, and analyzed as a unit. In survey research, the cohort is the population the study is designed to speak about, and its definition is the first artifact a methodologist should produce. [3][7]

Where the word comes from

“Cohort” comes from the Latin cohors, meaning an enclosure, company, or crowd, from co- plus hortus, garden. In the Roman army a cohort was one tenth of a legion, six centuries, an organizational unit defined by its boundaries and its shared assignment. [2]

The modern social-science usage is conventionally traced to Norman B. Ryder's 1965 article “The Cohort as a Concept in the Study of Social Change,” published in the American Sociological Review. Ryder argued that “society persists despite the mortality of its individual members, through processes of demographic metabolism and particularly the annual infusion of birth cohorts.” [1] He proposed that researchers exploit the alignment between social change and cohort identification, studying both intracohort trends (life-cycle change within a group) and intercohort trends (change across groups). [1] Cohort analysis in the social sciences inherits its analytic frame from that paper.

The four uses of “cohort”

There is no single canonical definition of “cohort” across research disciplines. The word is used in at least four ways, with overlapping mechanics and distinct purposes.

Demographic and generational cohort. Pew Research Center calls “age cohort” a “fancy way of referring to a group of people who were born around the same time.” [8] A typical generation spans 15 to 18 years. [8] Pew's current working definitions: Millennials (born after 1980, later revised to 1981 to 1996), Gen X (1965 to 1980), Baby Boomers (1946 to 1964), Silents (1928 to 1945). [9] Pew notes that generational labels are one way to define an age cohort, not the only way. Alternative framings group people by birth decade, by age during a key historical event (the Great Recession, COVID-19), or by exposure to a technological milestone (the iPhone launch). [8]

Epidemiological cohort. The NIH National Library of Medicine MeSH definition, introduced in 1989: “Studies in which subsets of a defined population are identified. These groups may or may not be exposed to factors hypothesized to influence the probability of the occurrence of a particular disease or other outcome. Cohorts are defined populations which, as a whole, are followed in an attempt to determine distinguishing subgroup characteristics.” [4] The cohort is identified before the outcome occurs and is defined by exposure status, not by outcome. Epidemiological cohort studies are observational rather than randomized, so results state associations, not causation, and follow the STROBE reporting guidelines. [5] The Framingham Heart Study, the Nurses Health Study, and the Nun Study are well-known examples. [5]

Social-science and survey-research cohort. A cohort is “a set of individuals entering a system at the same time,” presumed to share similarities from that shared entry. [3] Cohort analysis seeks to explain outcomes by separating three temporal dimensions: the cohort itself, age (time since entry), and period (when the outcome was measured). [3] The central analytic challenge is disentangling age effects, cohort effects, and period effects, the so-called APC identification problem, for which there is no universally agreed solution. [2][3]

Marketing and product-analytics cohort. In product analytics, a cohort is “a group that shares a common characteristic over time,” a “type of segment.” [10] In practice the defining characteristic is almost always an acquisition event and its date: users who first signed up in January 2026, customers who placed a first order during a campaign window. Retention or conversion is then tracked across subsequent periods. The mechanics are similar to a research cohort (defined group, followed over time), but the purpose differs. A product-analytics cohort is formed post-hoc from event logs, is not a probability sample, carries no margin of error, and measures behavior rather than attitude. A survey-research cohort is defined a priori through screener criteria, drawn from a sample frame, and supports inferential claims about a defined population.

What makes a research cohort well defined

A research cohort is not a label slapped onto a sample after the fact. It is a written specification with three components.

Explicit decision rules. AAPOR's Transparency Initiative, in disclosure element 4, requires researchers to be “specific about the decision rules used to define the population when describing the study population, including location, age, other social or demographic characteristics... time.” [7] If the rule cannot be written down such that two analysts would produce the same membership list, the cohort is not yet defined.

A sample frame. Disclosure element 5 requires explicit statement of how the sample was drawn, “whether the sample comes from a frame selected using a probability-based methodology... or if the sample was selected using non-probability methods,” and, where a frame or panel is used, “the name of the supplier of the sample or list and nature of the list (e.g., registered voters in the state of Texas in 2018, pre-recruited panel or pool).” [7] The cohort definition specifies who; the sample frame specifies where they came from.

A sample size and a measure of precision. Disclosure element 8 requires sample sizes by frame, margin of error for probability samples, and, for non-probability samples, “a detailed description of how the underlying model was specified, its assumptions validated, and the measure(s) calculated.” [7] Without this, the cohort exists on paper but produces no defensible numbers.

Together, elements 4, 5, and 8 are what a serious research organization means by “shipping a cohort definition.” AAPOR does not use the word “cohort” as a formal disclosure artifact; the obligation is embedded in the population, frame, and precision requirements.

A worked example: the likely-voter cohort

Gallup has used likely-voter models since 1950 to identify Americans most likely to vote in a coming election. [6] The model is the clearest publicly documented example of how a research cohort is constructed.

Gallup asks respondents seven questions: how much thought they have given to the election, whether they know the location of their polling place, whether they voted there before, how often they usually vote, whether they plan to vote in this election, how certain they are that they will vote, and whether they voted in the last comparable election. Responses are scored on a 0 to 7 scale. Respondents who are not registered, or who say they do not plan to vote, are assigned a score of 0. [6]

The top scorers, in number equal to the projected turnout, are designated likely voters. Gallup runs the model under multiple turnout scenarios, for example 35 percent versus 40 percent of adults in a midterm year, each producing a different cohort. [6] In a national sample of 3,000 adults, the resulting likely-voter cohort has a weighted sample size of about 1,200 to 1,800 voters and an unweighted size of about 1,800. [6] “Likely voter models are necessary in pre-election polling because a substantial proportion of eligible voters do not end up voting in U.S. elections.” [6]

The example shows every component of a defendable cohort in one place: explicit inclusion criteria (the seven questions), a stated sample frame (national adults), a chosen subset size (top scorers equal to projected turnout), and a precision implication (the smaller the subset, the wider the margin of error).

Cohorts in synthetic audience research

The synthetic audience literature inherits the survey-research definition. Argyle et al. (2023), in Political Analysis, demonstrated that GPT-3 can be conditioned on “sociodemographic backstories from real human participants in multiple large surveys conducted in the United States” to produce what they called “silicon samples.” [11] When the conditioning is precise, the resulting synthetic populations “accurately emulate response distributions from a wide variety of human subgroups,” a property the authors named algorithmic fidelity. [11]

The conditioning works across single-dimension cohorts (women, men, Millennials, Baby Boomers) and across intersections (Black immigrants, female Republicans, White males). [11] The mechanical lesson for synthetic research is direct: the precision of the cohort definition determines the fidelity of the synthetic population that stands in for it. An imprecise cohort produces an imprecise synthetic audience, the same way an imprecise screener produces an unreliable human sample.

That is why AAPOR elements 4, 5, and 8 still apply when the respondents are synthetic. The population must be defined. The frame must be stated. The accuracy of the synthetic responses against real-world data must be evaluated. The medium changes; the discipline does not.

Source notes

[1] Ryder, N. B. (1965). “The Cohort as a Concept in the Study of Social Change.” American Sociological Review, 30, 843–861. DOI: 10.2307/2090964. Reprinted in Mason & Fienberg eds., Cohort Analysis in Social Research, Springer 1985. Cited for the canonical social-science origin of the cohort concept and Ryder's framing of intracohort and intercohort analysis.

[2] Encyclopedia.com consolidation of Oxford Dictionary of Sociology (Gordon Marshall ed.), Oxford Pocket Dictionary of Current English, and Concise Oxford Dictionary of English Etymology. Entries updated 2018. Cited for etymology (Latin cohors, co- + hortus), Roman military usage (tenth of a legion, six centuries), and the standard sociology definition.

[3] UCLA California Center for Population Research, “Cohort Analysis” methods paper. Uploaded 2024. Cited for the working definition of cohort as a set entering a system at the same time, and the APC (age-period-cohort) analytic frame.

[4] NIH National Library of Medicine MeSH term “Cohort Studies” (D015331), introduced 1989. Cited for the epidemiological definition adopted across PubMed-indexed literature.

[5] Andrade, C. (2022). “Research Design: Cohort Studies.” Indian Journal of Psychological Medicine, 44(2), 189–191. PMC9120971. DOI: 10.1177/02537176211073764. Cited for the peer-reviewed plain-language description of epidemiological cohort studies, STROBE reporting standard, and named examples (Framingham, Nurses Health, Nun Study).

[6] Gallup, “Understanding Gallup's Likely Voter Models,” October 4, 2010. Cited for the seven-question screener, the 0 to 7 scoring scheme, the turnout-based cohort construction, the weighted likely-voter sample size of approximately 1,200 to 1,800 from a 3,000-adult sample, and the rationale for using a likely-voter cohort.

[7] AAPOR Transparency Initiative, disclosure elements (revised April 2021). Cited for elements 4 (population decision rules), 5 (sample frame and method), and 8 (sample sizes and measures of precision).

[8] Parker, K., Pew Research Center, “How we plan to report on generations moving forward,” May 22, 2023. Cited for Pew's plain-English age-cohort definition, the 15 to 18 year span convention, and alternative cohort framings (birth decade, historical event, technological milestone).

[9] Pew Research Center, “Demographic Definitions,” last updated 2021-05-26. Cited for Pew's working generation definitions (Millennials, Gen X, Baby Boomers, Silents).

[10] Reforge, “Conduct cohort analysis,” March 18, 2024. Cited for the product-analytics definition of cohort as “a group that shares a common characteristic over time,” used as a type of segment.

[11] Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., & Wingate, D. (2023). “Out of One, Many: Using Language Models to Simulate Human Samples.” Political Analysis, 31(3). DOI: 10.1017/pan.2023.2. Cited for the “silicon samples” concept, the conditioning approach on sociodemographic backstories, and algorithmic fidelity across single-dimension and intersected cohorts.

Put the platform in front of a real decision.

Bring a decision your team is working on. A research engineer will draft the cohort, the sample, and the study with you, in one working session. The methodology comes out with the result.