Two Way Table Statistics- Complete Guide
What Is a Two Way Table?
A two way table is a grid that shows how two categorical variables relate to each other. You count how many observations fall into each combination of categories, then arrange those counts in rows and columns.
That's it. No magic involved.
Statisticians also call these contingency tables or crosstabs. The names are interchangeable. Use whichever your audience prefers.
Why Bother With Two Way Tables?
Because they answer a simple question: "Does variable A relate to variable B?"
Example: You survey 500 people about smoking and lung disease. A two way table shows you exactly how many smokers have lung disease, how many smokers don't, how many non-smokers have lung disease, and so on. You can see patterns instantly.
Raw data dumps don't show you these relationships. Two way tables do.
The Basic Structure
Every two way table has the same anatomy:
- Rows represent one variable's categories
- Columns represent the other variable's categories
- Cell counts show how many observations hit each combination
- Row totals show the total for each row
- Column totals show the total for each column
- Grand total is the total number of observations
Marginal Distributions
The row and column totals sit in the margins of the table. They show you the distribution of each variable by itself, ignoring the other variable. These are called marginal distributions.
Conditional Distributions
Conditional distributions show you one variable's pattern, given a specific value of the other variable. You calculate these by dividing cell counts by row or column totals.
If you want to know "what percentage of smokers have lung disease," you divide the smoker-lung disease count by total smokers. That's a conditional distribution.
Reading a Two Way Table: A Real Example
Here's a table showing 1,000 people's preference for coffee or tea, broken down by age group:
| Coffee | Tea | Row Total | |
|---|---|---|---|
| Under 30 | 180 | 70 | 250 |
| 30-50 | 150 | 150 | 300 |
| Over 50 | 100 | 350 | 450 |
| Column Total | 430 | 570 | 1000 |
From this table, you can immediately see that younger people prefer coffee while older people prefer tea. The relationship is visible without running any test.
Types of Two Way Tables
Frequency Tables
These show raw counts. The table above is a frequency table.
Relative Frequency Tables
These show proportions or percentages. You divide each cell by the grand total.
| Coffee | Tea | Row Total | |
|---|---|---|---|
| Under 30 | 18.0% | 7.0% | 25.0% |
| 30-50 | 15.0% | 15.0% | 30.0% |
| Over 50 | 10.0% | 35.0% | 45.0% |
| Column Total | 43.0% | 57.0% | 100% |
Conditional Probability Tables
These show percentages within rows or columns. Row percentages answer "within each age group, what drink do they prefer?" Column percentages answer "within each drink preference, what age groups are represented?"
How to Create a Two Way Table
Here's the practical process:
- Identify your two variables. Both must be categorical.
- List the categories for each variable. Keep them mutually exclusive.
- Collect or organize your data. Each observation goes into exactly one cell.
- Count observations per cell. Tally manually or use software.
- Calculate totals. Row totals, column totals, and grand total.
- Add marginal distributions if needed. For context.
Doing It in Excel
Excel's PivotTable feature handles this well. Select your data, insert a PivotTable, drag one variable to rows, the other to columns, and drag your count variable to values. Done.
Doing It in Python
import pandas as pd
# Create a sample dataset
data = {'Age': ['Under 30', 'Under 30', '30-50', '30-50', 'Over 50', 'Over 50'],
'Drink': ['Coffee', 'Tea', 'Coffee', 'Tea', 'Coffee', 'Tea'],
'Count': [180, 70, 150, 150, 100, 350]}
df = pd.DataFrame(data)
# Create the crosstab
table = pd.pivot_table(df, values='Count', index='Age', columns='Drink', aggfunc='sum', fill_value=0)
print(table)
Statistical Tests for Two Way Tables
Tables show patterns. Tests tell you if those patterns are real or just random noise.
Chi-Square Test
The most common test for two way tables. It compares observed counts to counts you'd expect if the variables had no relationship.
You calculate expected counts using this formula:
Expected = (Row Total × Column Total) / Grand Total
Then you calculate the chi-square statistic and compare it to a critical value. If your chi-square is large enough, you reject the null hypothesis of independence.
Requirements: expected counts should be 5 or more in each cell. If not, use Fisher's exact test instead.
Cramér's V
Chi-square tells you if a relationship exists. Cramér's V tells you how strong it is. It ranges from 0 (no relationship) to 1 (perfect relationship).
Interpretation: V around 0.1 is weak, 0.3 is moderate, 0.5 or higher is strong.
What Two Way Tables Cannot Tell You
These tables show association, not causation. Just because two variables are related doesn't mean one causes the other.
Example: A table might show that people who eat cereal score higher on tests. That doesn't mean cereal makes you smarter. A third variable (maybe income, or family habits) could explain both.
Two way tables are descriptive tools. For causation, you need experimental design, not just cross-tabulation.
Common Mistakes to Avoid
- Ignoring sample size. Small samples produce unstable percentages. A 60-40 split in a group of 10 people means nothing.
- Mixing up row vs. column percentages. Make sure you know which one answers your question.
- Forgetting to check expected counts. Chi-square breaks down with sparse data.
- Reading down columns when you should read across rows. Know what comparison you're actually making.
When to Use Two Way Tables
Two way tables work best when:
- Both variables are categorical (or discrete with few values)
- You want to show the actual distribution of responses
- You need to explain findings to non-statisticians
- You're preparing data for a chi-square test
They don't work well when variables are continuous. For age as a continuous variable, you'd bin it into groups first. For income, same thing.
The Bottom Line
Two way tables are a foundational tool. They organize categorical data, reveal patterns, and set you up for formal statistical testing. Learn to read them, build them, and know their limits.
That's all you need.