Understanding Percentiles, Quartiles, and Box Plots
Percentiles and quartiles are essential tools in descriptive statistics. They help you understand where a particular value sits relative to the rest of a dataset. Doctors use percentiles to track children's growth, universities use them to interpret standardised test scores, and data analysts use quartiles to detect outliers and summarise distributions.
This guide covers what percentiles and quartiles are, how to calculate them step by step, the interquartile range (IQR), box plots, and practical applications with worked examples.
What Is a Percentile?
A percentile indicates the value below which a given percentage of observations in a dataset falls. If your exam score is at the 85th percentile, that means you scored higher than 85% of all test takers.
The th percentile () is the value such that of the data falls at or below it.
Percentile Rank Formula
The percentile rank of a score in a dataset tells you what percentage of values fall below it:
Some formulas also include values equal to by using:
Where is the number of values less than , is the number of values equal to , and is the total count.
Finding the Value at a Given Percentile
To find the value at the th percentile in an ordered dataset of values:
- Sort the data in ascending order.
- Calculate the index: .
- If is a whole number, the percentile is the value at position .
- If is not a whole number, interpolate between the two adjacent values.
What Are Quartiles?
Quartiles are specific percentiles that divide a dataset into four equal parts:
- Q1 (First Quartile, 25th percentile): 25% of the data falls below this value.
- Q2 (Second Quartile, 50th percentile): This is the median. 50% of the data falls below this value.
- Q3 (Third Quartile, 75th percentile): 75% of the data falls below this value.
Together, Q1, Q2, and Q3 split the sorted dataset into four groups, each containing approximately 25% of the observations.
Worked Example 1: Finding Quartiles from Test Scores
Problem: A class of 12 students received the following scores on a maths test:
The data is already sorted. With values:
Finding Q1 (25th percentile):
Position 3.25 means we take the value at position 3 (which is 55) and interpolate 0.25 of the way to position 4 (which is 60):
Finding Q2 (median):
Interpolating between position 6 (67) and position 7 (70):
Finding Q3 (75th percentile):
Interpolating between position 9 (78) and position 10 (82):
Summary: Q1 = 56.25, Q2 = 68.5, Q3 = 81. This tells us that the middle 50% of scores falls between 56.25 and 81.
Try it yourself
Use our Percentile Calculator to find percentiles and quartiles for any dataset instantly.
The Interquartile Range (IQR)
The interquartile range measures the spread of the middle 50% of the data. It is calculated as:
Using our test scores example:
The IQR is a robust measure of spread because it is not affected by extreme values (outliers). Unlike the range, which uses the minimum and maximum, the IQR focuses on the central portion of the data.
Using IQR to Detect Outliers
A common rule for identifying outliers is the 1.5 × IQR rule. A data point is considered a potential outlier if it falls below or above :
For our test scores:
Since all scores fall between 19.125 and 118.125, there are no outliers in this dataset.
Try it yourself
Use our Outlier Calculator to identify outliers using the IQR method automatically.
Box Plots Explained
A box plot (also called a box-and-whisker plot) is a visual representation of a dataset based on the five-number summary:
- Minimum (excluding outliers)
- Q1 (first quartile)
- Q2 (median)
- Q3 (third quartile)
- Maximum (excluding outliers)
The "box" extends from Q1 to Q3, with a line at the median (Q2). The "whiskers" extend from the box to the minimum and maximum values within the fences. Any data points beyond the whiskers are plotted as individual dots, indicating potential outliers.
Box plots are particularly useful for comparing distributions across groups. For instance, you could place box plots for different exam classes side by side to compare their score distributions at a glance.
Worked Example 2: Percentile Rank
Problem: In a dataset of 20 student heights (in cm), a student with height 172 cm wants to know their percentile rank. The sorted dataset is:
There are 12 values below 172, and 2 values equal to 172. Using the adjusted formula:
Result: The student is at the 65th percentile, meaning they are taller than approximately 65% of the group.
Worked Example 3: Comparing Two Datasets Using Quartiles
Problem: Two classes took the same test. Compare their performance using the five-number summary.
Class A scores (sorted):
Class B scores (sorted):
For each class with :
| Measure | Class A | Class B |
|---|---|---|
| Minimum | 40 | 55 |
| Q1 | 55 | 62 |
| Q2 (Median) | 66.5 | 68.5 |
| Q3 | 75 | 75 |
| Maximum | 90 | 80 |
| IQR | 20 | 13 |
Analysis: Class B has a higher median (68.5 vs 66.5) and a smaller IQR (13 vs 20), meaning their scores are more consistent. Class A has a wider spread, with the lowest score at 40 and the highest at 90. In a box plot, Class A's box would be wider and its whiskers longer.
Percentile vs Percentage
These terms are often confused but they mean different things:
- Percentage refers to a score out of a total. Scoring 80% on a test means you got 80 out of 100 marks correct.
- Percentile refers to your rank relative to others. Being at the 80th percentile means you scored higher than 80% of all test takers, regardless of what your actual score was.
It is entirely possible to score 60% on a test yet be at the 90th percentile if the test was very difficult and most students scored below 60%.
Applications of Percentiles and Quartiles
Growth Charts
Paediatricians use percentile charts to track children's height and weight. A child at the 75th percentile for height is taller than 75% of children of the same age and sex. Consistent tracking over time is more important than any single reading.
Standardised Tests
Tests such as the SAT, GRE, and GMAT report scores as percentiles so that students can understand how they performed relative to all other test takers. A score at the 95th percentile is in the top 5%.
Income and Wealth Distribution
Economists use percentiles to describe income inequality. The "top 1%" refers to households at or above the 99th percentile of income. The IQR of household incomes shows the spread of the middle class.
Quality Control
In manufacturing, percentiles help set tolerance limits. If the 5th percentile of a bolt's tensile strength exceeds the required minimum, then at least 95% of bolts meet the specification.
Try it yourself
Explore how spread and variability relate to percentiles with our Standard Deviation Calculator.
Frequently Asked Questions
What is the difference between percentile and quartile?
Quartiles are specific percentiles. Q1 is the 25th percentile, Q2 is the 50th percentile (the median), and Q3 is the 75th percentile. Percentiles can be any value from 1 to 99, while quartiles divide the data into exactly four equal groups.
Can a percentile be 0 or 100?
In most definitions, percentile ranks range from 1 to 99. The minimum value in a dataset is sometimes said to be at the 0th percentile and the maximum at the 100th, but strictly speaking, the 100th percentile would mean you scored higher than 100% of test takers, which is logically impossible if you are one of them.
Why are there different methods for calculating percentiles?
There are at least nine recognised methods for computing percentiles and quartiles. They differ in how they handle interpolation and edge cases. For large datasets, the differences are negligible. For small datasets, the choice of method can produce slightly different results. The most common methods are the inclusive (used by Excel's PERCENTILE.INC) and exclusive (PERCENTILE.EXC) functions.
How is the IQR different from the range?
The range is the difference between the maximum and minimum values and is sensitive to extreme values. The IQR is the difference between Q3 and Q1 and only looks at the middle 50% of the data. This makes the IQR much more robust against outliers. For example, if the highest score in a class changes from 95 to 150 (an error), the range changes dramatically but the IQR remains the same.
What does it mean if Q1 and Q3 are close together?
A small IQR means the middle 50% of values are clustered tightly together, indicating low variability. If Q1 and Q3 are far apart, the data is more spread out. In a box plot, a narrow box indicates consistency while a wide box indicates greater dispersion.
Related Articles
Understanding Z-Scores and Normal Distribution
Learn what z-scores mean, how the bell curve works, and how to use z-score tables. Includes worked examples and links to our standard deviation calculator.
Understanding Eigenvalues and Eigenvectors: A Practical Guide
Learn what eigenvalues and eigenvectors are, how to compute them for 2x2 and 3x3 matrices, and why they matter in data science and engineering.
Understanding Integrals and Antiderivatives: A Complete Guide
Learn how integrals work, from basic antiderivatives to definite integrals. Covers the power rule for integration, substitution, and the Fundamental Theorem of Calculus.