What is a crosstab?
A crosstab, short for cross-tabulation, is a table that shows how the answers to one survey question break down across the categories of another. It displays the frequency distribution of two variables at once, so you can see how different segments of respondents answered. In market research it is the most common way to analyze survey data, and it is the artifact most buyers actually look at when results come back.
A crosstab is also known as a contingency table. The two terms describe the same construct: a matrix of frequencies crossing two categorical variables. The difference is register, not structure. “Contingency table” predominates in academic statistics, while “crosstab” or simply “table” predominates in market-research practice.
Definition. A crosstab (cross-tabulation, or contingency table) is a table in matrix format that displays the multivariate frequency distribution of two or more variables. Each row represents a category of one variable, each column a category of another, and each cell shows the count of respondents who meet both conditions at once. The term “contingency table” was first used by Karl Pearson in 1904.
What does a crosstab look like?
A crosstab analyzes one survey question by another. The rows hold the categories of one variable, the columns hold the categories of a second variable, and wherever a row and a column cross there is a cell. Each cell contains the frequency of respondents who satisfy both the row condition and the column condition simultaneously. A simple example crosses gender (rows: male, female) by age (columns: three age bands). The cell at the intersection of “female” and “18 to 34” holds the number of respondents who are both.
The vocabulary is older than the spreadsheet, and market researchers use a specific set of terms for the parts.
Banners. The column groupings are called banners, also referred to as breaks, breakdowns, headings, or table banners. Banners are the breakdown variables, typically demographics such as gender, age, and region. Most studies apply a standard banner to every question so that each result can be read the same way across segments.
Stubs. The rows are called stubs. Rows and columns together are sometimes called axes.
Base, or n. A base row shows the number of respondents behind each column. It is also written as n. The base is what tells you whether a percentage is built on solid ground or on a handful of people.
Counts. The whole numbers in the body of the table, as opposed to the percentages, are called counts, figures, or absolutes. They are the raw number of respondents in each cell.
Marginals. The totals for a single category of one variable, summed across the levels of the other variable, are the marginal values, or marginals. The grand total of the whole table sits in the bottom-right corner.
Row, column, and total percentages
Counts alone are hard to compare across segments of different sizes, so crosstabs are usually read in percentages. There are three kinds, and confusing them is the most common reading error.
Column percentages. Each cell is expressed as a percent of its column total. Column percentages, also called vertical percentages, are the most common display mode in market research. They answer the question “of the people in this segment, what share gave this answer?”
Row percentages. Each cell is expressed as a percent of its row total. Row percentages, also called horizontal percentages, answer the reverse question: “of the people who gave this answer, what share fall into each segment?”
Total percentages. Each cell is expressed as a percent of all respondents. Total percentages, also called percentages based on total or sample profiling, sum to 100% across the whole body of the table.
When a table shows both column and row percentages together, the column with no figure in the total column holds the row percentages. Some syndicated research tools add a fourth figure, an index, where 100 represents the population average and a value above 100 means the segment is more likely than average to meet the criterion.
How is a crosstab different from a contingency table?
It is not. The two are synonymous. A contingency table is named for the idea of contingency: it shows the frequency of each category in one variable contingent upon a specific level of the other variable. Statistics also uses the label “r x c table,” for rows by columns. Microsoft Excel calls the same construct a pivot table, a term not used in market-research circles.
The name carries some history. Karl Pearson first used “contingency table” in his 1904 paper “On the Theory of Contingency and Its Relation to Association and Normal Correlation.” His original two-by-two example crossed the presence or absence of a smallpox vaccination mark against disease outcomes in the 1890 smallpox epidemic. The construct has been a fixture of survey research ever since.
How do you test whether a crosstab result is significant?
Seeing that two segments answered differently is not the same as knowing the difference is real. The standard test on a contingency table is the chi-square test of independence. The null hypothesis is that the two variables are independent, meaning unrelated. The alternative hypothesis is that they are related. If the test rejects the null, the difference between segments is unlikely to be an artifact of sampling. The degrees of freedom are computed as df = (R - 1)(C - 1), where R is the number of rows and C the number of columns.
In market-research crosstabs, significance is usually surfaced on the table itself. Column-comparison tests display letters, where a letter on a cell marks a significant difference from the column that letter refers to. Cell comparisons instead use color or arrows to flag a cell that stands out.
The chi-square test has an assumption worth respecting: it depends on the expected cell counts being large enough. When expected counts are very small, the result may not be valid. One conventional guideline, from Minitab, is that for variables with two or three levels you can trust the test if all cells have expected counts of at least 2 and no more than half the cells have expected counts below 5, or if all cells have expected counts of at least 3. For small two-by-two tables, Fisher's exact test is the recommended alternative, since it is accurate at any sample size.
How do you read a crosstab without being misled?
A crosstab rewards a careful eye and punishes a careless one. A few habits separate a defensible read from a wrong one.
Check the base before you trust a percentage. Small bases make percentages volatile. If the base for a column is ten respondents, each respondent accounts for 10% of that column. As a practitioner rule of thumb, bases below 30 deserve caution: at a base of 25, a single respondent counts as 4%. This is a separate concern from the chi-square expected-count thresholds above. The under-30 guideline is about the stability of a reported percentage; the expected-count rule is about the validity of a specific significance test. Both matter, but they are not the same rule.
Always show the base row. Good practice is to display the base so anyone inspecting the table can see where the data may be unreliable. Many practitioners suppress columns, or entire tables, when bases fall too low.
Know which percentage you are reading. Column percentages compare how each segment breaks down on a given answer. Row percentages tell you how a given answer's respondents distribute across segments, and they can mislead when the column bases are unequal unless you check the base row.
Mind the rounding. Research software typically rounds to the nearest whole number, so a sub-total built by summing rounded figures can differ from the computed sub-total by a percentage point. A row that appears not to add up is usually a rounding artifact, not an error.
References
[1] Wikipedia contributors. Contingency table. Wikipedia. Continuously updated. ↑
[2] MRDCL. Guide to Market Research Tables (Crosstabs). Market Research Data Collection Ltd, 2021. ↑
[3] Hilley, C. Chi-Square Test of Independence. Statistics LibreTexts, Kennesaw State University. ↑
[4] Flatley, D. (2018). Contingency Tables. Statistics.com, Institute for Statistics Education. ↑
[5] Minitab. Are the results of my chi-square test invalid?. Minitab LLC support documentation. ↑
[6] Olson, J. (2023). How to Interpret Simmons Crosstab Data. Penn State World Campus. ↑
More from the knowledge base.
What is a synthetic audience?
A plain definition of synthetic audiences: AI-generated populations that behave like real ones, how they are built and calibrated, what they are good for, and what separates a defensible one from a generic LLM wrapper.
Read article →What is a cohort?
What a cohort is across four research traditions, what AAPOR disclosure elements require, and why cohort precision determines synthetic audience fidelity.
Read article →What is a synthetic respondent?
The individual record in a synthetic study: how it differs from a persona, cohort, or audience; grounded vs prompted approaches; where it works and where it fails.
Read article →Put the platform in front of a real decision.
Bring a decision your team is working on. A research engineer will draft the cohort, the sample, and the study with you, in one working session. The methodology comes out with the result.