ANOVA Explained: One-Way Analysis of Variance Guide
Analysis of Variance, or ANOVA, is a statistical method used to compare the means of three or more groups to determine whether at least one group mean is significantly different from the others. It is one of the most widely used techniques in experimental research, from clinical trials to agricultural studies to A/B/C testing in marketing.
This guide explains how ANOVA works, covers both one-way and two-way variants, walks through a complete worked example with three groups, and discusses assumptions, post-hoc tests, and common pitfalls.
Why Not Just Use Multiple t-Tests?
If you want to compare three groups (A, B, and C), you might consider running separate t-tests: A vs B, A vs C, and B vs C. The problem is that each test carries a risk of a false positive (typically 5%). With three comparisons, the overall risk of at least one false positive rises to approximately:
With 10 groups, you would need 45 pairwise comparisons, and the false positive rate would balloon to over 90%. ANOVA solves this by testing all groups simultaneously in a single test, keeping the Type I error rate at the chosen significance level (usually 5%).
The Core Idea: Between vs Within Group Variance
ANOVA works by decomposing the total variability in the data into two components:
- Between-group variance (SSB). This measures how much the group means differ from the overall (grand) mean. If the groups are truly different, this will be large.
- Within-group variance (SSW). This measures the natural variability of individual observations within each group. This represents noise or random variation.
ANOVA compares these two sources of variation. If the between-group variance is substantially larger than the within-group variance, we have evidence that the group means are not all equal.
Key Terms and Formulas
Sum of Squares Total (SST)
The total variation of every data point from the grand mean.
Sum of Squares Between (SSB)
The variation due to differences between group means.
Sum of Squares Within (SSW)
The variation within each group. Note that .
Mean Squares
Mean squares are the sums of squares divided by their degrees of freedom:
Where is the number of groups and is the total number of observations across all groups.
The F-Statistic
The F-statistic is the ratio of between-group variance to within-group variance. A large F value means the group means differ more than would be expected from random variation alone. The F-statistic follows an F-distribution with degrees of freedom.
Worked Example: Comparing Three Teaching Methods
A school wants to test whether three teaching methods produce different exam scores. They randomly assign 15 students to three groups of 5:
| Method A | Method B | Method C |
|---|---|---|
| 72 | 85 | 90 |
| 68 | 82 | 88 |
| 75 | 79 | 95 |
| 70 | 88 | 92 |
| 65 | 81 | 85 |
Step 1: Calculate the Group Means and Grand Mean
Step 2: Calculate SSB (Between-Group Sum of Squares)
Step 3: Calculate SSW (Within-Group Sum of Squares)
For Method A (mean = 70):
For Method B (mean = 83):
For Method C (mean = 90):
Step 4: Calculate Mean Squares
Degrees of freedom: and .
Step 5: Calculate the F-Statistic
Step 6: Determine the Result
The critical value of F for and at the 5% significance level is approximately 3.89. Our F-statistic of 37.24 far exceeds this critical value.
Conclusion: We reject the null hypothesis. There is statistically significant evidence that at least one teaching method produces different exam scores. The p-value for is less than 0.0001.
The ANOVA Table
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Between | 1030 | 2 | 515 | 37.24 |
| Within | 166 | 12 | 13.83 | |
| Total | 1196 | 14 |
Try it yourself
Check the significance of your F-statistic with our P-Value Calculator and verify your group standard deviations with the Standard Deviation Calculator.
One-Way vs Two-Way ANOVA
One-Way ANOVA
One-way ANOVA tests the effect of a single factor (independent variable) on a continuous outcome. The example above is a one-way ANOVA: one factor (teaching method) with three levels (A, B, C). The null hypothesis is that all group means are equal: .
Two-Way ANOVA
Two-way ANOVA tests the effects of two factors simultaneously. For example, you might study the effect of both teaching method and class size on exam scores. Two-way ANOVA can also detect interaction effects, where the combination of two factors produces an effect that is different from the sum of their individual effects.
For instance, Method C might outperform Method A only in small classes but show no advantage in large classes. A one-way ANOVA would miss this nuance entirely.
Assumptions of ANOVA
ANOVA results are reliable only when certain assumptions are met:
- Independence. Observations must be independent of each other. This is typically ensured by random sampling or random assignment.
- Normality. The data within each group should be approximately normally distributed. ANOVA is fairly robust to mild violations of normality, especially with larger sample sizes (due to the Central Limit Theorem). You can check normality with a Shapiro-Wilk test or by examining Q-Q plots.
- Homogeneity of variances (homoscedasticity). The variance within each group should be roughly equal. This can be tested with Levene's test. If variances are unequal, use Welch's ANOVA instead, which does not assume equal variances.
If normality is severely violated and sample sizes are small, consider the non-parametric alternative: the Kruskal-Wallis test, which compares medians rather than means.
Post-Hoc Tests: Which Groups Differ?
A significant ANOVA result tells you that at least one group differs, but it does not tell you which groups differ. Post-hoc tests make pairwise comparisons while controlling the overall Type I error rate.
Common Post-Hoc Tests
- Tukey's HSD (Honestly Significant Difference). The most popular choice. It compares all possible pairs of means and controls the family-wise error rate. Best when group sizes are equal.
- Bonferroni correction. Divides the significance level by the number of comparisons. More conservative than Tukey. Works with unequal group sizes.
- Scheffé's method. The most conservative post-hoc test. It allows testing any linear combination of means, not just pairwise comparisons. Use when exploring complex contrasts.
- Games-Howell. Used when the assumption of equal variances is violated. Does not assume homoscedasticity.
For our teaching methods example, Tukey's HSD would compare A vs B, A vs C, and B vs C. Given the large F-statistic, we would likely find that all three pairs differ significantly, with Method C producing the highest scores and Method A the lowest.
Worked Example 2: Plant Growth Under Three Fertilisers
A botanist tests three fertilisers on plant height (cm) after 4 weeks:
| Fertiliser X | Fertiliser Y | Fertiliser Z |
|---|---|---|
| 14 | 18 | 22 |
| 16 | 20 | 19 |
| 13 | 17 | 24 |
| 15 | 21 | 21 |
Group means: .
Grand mean: .
Within-group SS for each group:
The critical value for at the 5% level is approximately 4.26. Since 16.20 > 4.26, we reject the null hypothesis. There is a significant difference in plant height among the three fertilisers.
Worked Example 3: Response Times Across Three Interfaces
A UX researcher measures user response time (seconds) on three different interface designs, with 6 users per group:
- Interface 1: 2.1, 2.4, 2.3, 2.5, 2.2, 2.0
- Interface 2: 2.8, 3.1, 2.9, 3.0, 2.7, 3.2
- Interface 3: 2.3, 2.5, 2.4, 2.6, 2.2, 2.1
Group means: . Grand mean: .
Computing for each group and summing gives .
With and a critical value of about 3.68 at the 5% level, the result is highly significant. Interface 2 is notably slower. A post-hoc test would confirm that Interface 2 differs significantly from both Interfaces 1 and 3, while 1 and 3 may not differ significantly from each other.
Try it yourself
Use our Chi-Square Calculator for categorical comparisons or our P-Value Calculator to look up exact p-values for your F-statistic.
Frequently Asked Questions
What does a significant ANOVA result actually tell me?
It tells you that at least one group mean is statistically different from at least one other group mean. It does not tell you which specific groups differ. For that, you need post-hoc tests such as Tukey's HSD or Bonferroni correction.
Can I use ANOVA with only two groups?
Technically yes, and the result will be identical to an independent samples t-test (in fact, with two groups). However, a t-test is simpler and more commonly used when comparing exactly two groups.
What if my data is not normally distributed?
ANOVA is reasonably robust to non-normality, especially with larger sample sizes. If the violation is severe and sample sizes are small, use the Kruskal-Wallis test, which is the non-parametric equivalent of one-way ANOVA. It compares medians (technically, mean ranks) rather than means and does not assume normality.
What is the difference between ANOVA and ANCOVA?
ANCOVA (Analysis of Covariance) extends ANOVA by including one or more continuous covariates. For example, if you are comparing teaching methods but want to control for students' prior test scores, ANCOVA adjusts the group means for the covariate, giving a more precise comparison.
How do I report ANOVA results?
The standard reporting format in APA style is: . For our teaching methods example: . Also report the effect size, commonly (eta squared), which is . In our case, , indicating that teaching method accounts for 86.1% of the variance in exam scores, which is a very large effect.
Related Articles
Unit Conversion Guide: Methods, Formulas, and Common Conversions
Learn how to convert between metric and imperial units for length, mass, temperature, and more. Includes dimensional analysis and conversion factor tables.
Partial Derivatives Explained: A Guide to Multivariable Calculus
Learn how to compute partial derivatives, understand the gradient vector, and apply multivariable calculus to optimisation, physics, and economics with worked examples.
Statistics Basics: Mean, Median, Mode, and Measures of Spread
A complete guide to fundamental statistics. Covers mean, median, mode, range, variance, standard deviation, IQR, z-scores, and when to use each measure.