How to Find Outliers Using IQR and Z-Score Methods
Outliers are data points that fall significantly outside the general pattern of a dataset. They can distort statistical analyses, skew averages, and lead to misleading conclusions if left unaddressed. Equally, some outliers represent genuine phenomena that deserve investigation rather than removal.
This guide covers the most widely used methods for detecting outliers, including the IQR method, Z-score method, Modified Z-score, and the Grubbs test. We will work through real examples for each technique so you can apply them confidently to your own data.
What Is an Outlier?
An outlier is an observation that lies an abnormal distance from other values in a dataset. There is no single universal definition of "abnormal distance", which is why multiple detection methods exist. The right method depends on your data distribution, sample size, and the context of your analysis.
Consider a dataset of monthly salaries at a small company: . The value 12,000 stands out immediately. But with messier real-world data, visual inspection is not always reliable. That is where formal methods come in.
Method 1: The IQR Method (Interquartile Range)
The IQR method is the most common approach for identifying outliers. It is robust, easy to calculate, and does not assume a normal distribution. The method uses the spread of the middle 50% of the data to set boundaries (called fences) beyond which values are considered outliers.
The Outlier Formula (IQR Method)
First, calculate the interquartile range:
Then define the lower and upper fences:
Any data point below the lower fence or above the upper fence is classified as an outlier. The multiplier 1.5 is the standard choice. A multiplier of 3.0 identifies only extreme outliers.
Worked Example 1: Exam Scores
A class of 12 students scored the following marks on a maths exam:
Step 1: Find the quartiles. With 12 data points sorted in ascending order, is the median of the lower half (positions 1 to 6) and is the median of the upper half (positions 7 to 12).
Step 2: Calculate the IQR.
Step 3: Calculate the fences.
Step 4: Identify outliers. All scores fall between 45 and 101, so there are no outliers in this dataset. The score of 98 is high but not extreme enough to breach the upper fence.
Worked Example 2: Daily Website Traffic
A small business tracks daily website visits over two weeks:
First, sort the data in ascending order:
With 14 values, is the median of positions 1 to 7 and is the median of positions 8 to 14:
The value 410 exceeds the upper fence of 156.5, so it is an outlier. This could represent a viral social media post or a bot attack. The other 13 values all fall within the fences.
Try it yourself
Use our Outlier Calculator to instantly find outliers in your dataset using the IQR method.
Method 2: The Z-Score Method
The Z-score method works best when your data follows a roughly normal (bell-shaped) distribution. A Z-score tells you how many standard deviations a data point is from the mean.
Where is the data point, is the sample mean, and is the sample standard deviation. A data point is typically flagged as an outlier if , meaning it is more than 3 standard deviations from the mean.
Worked Example 3: Employee Commute Times
Ten employees report their daily commute times (in minutes):
Step 1: Calculate the mean.
Step 2: Calculate the sample standard deviation.
First find the sum of squared deviations from the mean:
Step 3: Calculate the Z-score for the suspicious value (95).
The Z-score of 2.82 is close to, but does not exceed, the threshold of 3. Under the strict rule, 95 would not be classified as an outlier. However, some analysts use a threshold of 2.5, under which it would qualify. This highlights an important limitation of the Z-score method: the mean and standard deviation are themselves affected by the outlier, which can mask its extremity.
Try it yourself
Calculate Z-scores for your data with our Z-Score Calculator and check how your Standard Deviation changes when you remove suspected outliers.
Method 3: Modified Z-Score (Iglewicz and Hoaglin)
The Modified Z-score addresses the key weakness of the standard Z-score. Instead of using the mean and standard deviation (both of which are sensitive to outliers), it uses the median and the Median Absolute Deviation (MAD).
Where:
- is the median of the dataset
- is the Median Absolute Deviation
- The constant 0.6745 is the 75th percentile of the standard normal distribution, which makes the Modified Z-score comparable to standard Z-scores for normal data
A data point is considered an outlier if .
Applying the Modified Z-Score to the Commute Data
Using the same commute time data: .
Step 1: The median of the sorted data is the average of positions 5 and 6:
Step 2: Calculate the absolute deviations from the median:
Step 3: The MAD is the median of these absolute deviations. Sorted: 0.5, 0.5, 1.5, 1.5, 2.5, 2.5, 3.5, 3.5, 4.5, 65.5. The median is:
Step 4: Calculate the Modified Z-score for 95:
A Modified Z-score of 17.67 far exceeds the threshold of 3.5, clearly flagging 95 as an outlier. Notice how much more decisive this result is compared to the standard Z-score of 2.82 we calculated earlier. The Modified Z-score is not pulled towards the outlier the way the mean and standard deviation are.
Method 4: The Grubbs Test
The Grubbs test is a formal statistical hypothesis test for detecting a single outlier in a univariate dataset that is approximately normally distributed. It tests the null hypothesis that there are no outliers against the alternative that there is exactly one.
The test statistic is:
This is compared to a critical value based on the sample size and chosen significance level (typically 0.05). If exceeds the critical value, the most extreme point is declared an outlier. The Grubbs test is best suited for small to moderate sample sizes and can only detect one outlier at a time. For multiple outliers, you would run the test iteratively, removing one outlier at a time.
Comparing the Methods
| Method | Assumes Normality? | Robust to Outliers? | Best For |
|---|---|---|---|
| IQR Method | No | Yes | General-purpose, any distribution |
| Z-Score | Yes | No | Large, normally distributed datasets |
| Modified Z-Score | No | Yes | Small or skewed datasets |
| Grubbs Test | Yes | No | Formal hypothesis testing |
When to Remove vs Keep Outliers
Detecting an outlier does not automatically mean you should remove it. The decision depends on the context:
Remove the Outlier When:
- It is a data entry error. A temperature of 999 degrees in a weather dataset is clearly a mistake.
- It is a measurement error. A faulty sensor producing impossible readings should be excluded.
- It belongs to a different population. If you are studying household incomes in a neighbourhood and one record belongs to a billionaire who does not actually live there, it does not belong in the analysis.
Keep the Outlier When:
- It represents genuine variation. In a study of marathon finish times, a world-class athlete is not an error.
- The outlier is what you are studying. Fraud detection, rare disease research, and quality control specifically look for unusual values.
- Your sample size is very small. Removing data points from small datasets can dramatically change results and reduce statistical power.
A good practice is to report your results both with and without outliers, so readers can judge the impact themselves. Always document which values were removed and why.
Try it yourself
Use our Outlier Calculator to detect outliers in your dataset, then check the spread with our Standard Deviation Calculator.
Frequently Asked Questions
What is the most reliable method for finding outliers?
There is no single best method. The IQR method is the safest default because it does not assume normality and is resistant to the influence of outliers themselves. For normally distributed data with a decent sample size, the Z-score method works well. The Modified Z-score is the best choice when you suspect your data is skewed or has multiple outliers, because the median and MAD are more resistant to extreme values than the mean and standard deviation.
Can a dataset have no outliers?
Yes. If all values fall within the fences defined by whatever method you use, the dataset has no outliers. This simply means the data is relatively consistent, which is common in well-controlled experiments or standardised measurements.
Is a Z-score of 2 an outlier?
Under the standard convention of , a Z-score of 2 is not considered an outlier. It means the value is 2 standard deviations from the mean, which occurs naturally in about 5% of normally distributed data. Some stricter analyses use a threshold of 2.5 or even 2, but 3 is the most widely accepted cutoff.
How do outliers affect the mean and standard deviation?
Outliers pull the mean towards them and inflate the standard deviation. For example, adding a single income of 1,000,000 to a dataset of salaries around 50,000 would dramatically increase both the mean and the standard deviation, making them poor summaries of the typical values. The median and IQR are far more resistant to this effect, which is why the IQR method is preferred for outlier detection.
Should I always remove outliers before running a regression?
Not necessarily. Outliers in regression can be influential points that disproportionately affect the slope and intercept, but they can also reveal important relationships. Check whether the outlier has high leverage (an extreme predictor value) and high influence (it substantially changes the regression when removed). Tools like Cook's distance can help quantify this. If the outlier is a valid data point and has moderate influence, consider using robust regression methods rather than deleting it.
Related Articles
How to Find the Slope of a Line: Formula, Methods, and Examples
Learn how to find the slope of a line using the slope formula, rise over run, and slope-intercept form. Covers parallel and perpendicular slopes with worked examples.
GCD and LCM Explained: Methods, Formulas, and Applications
Learn how to find the greatest common divisor and least common multiple using prime factorisation, the Euclidean algorithm, and their key relationship.
How to Solve Quadratic Equations: 3 Methods with Examples
Learn how to solve quadratic equations using the quadratic formula, factoring, and completing the square. Step-by-step worked examples with clear explanations.