Sample Page

In statistics, the 68–95–99.7 rule, also known as the empirical rule or 68–95–99.7 rule for a normal distribution^[1] and sometimes abbreviated 3sr or $3 σ$ , is a shorthand used to remember the percentage of values that lie within an interval estimate in a normal distribution: approximately 68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of the mean, respectively.

In mathematical notation, these facts can be expressed as follows, where $Pr()$ is the probability function,^[2] $Χ$ is an observation from a normally distributed random variable, $μ$ (mu) is the mean of the distribution, and $σ$ (sigma) is its standard deviation: ${\begin{aligned}\Pr(\mu -1\sigma \leq X\leq \mu +1\sigma )&\approx 68.27\%\\\Pr(\mu -2\sigma \leq X\leq \mu +2\sigma )&\approx 95.45\%\\\Pr(\mu -3\sigma \leq X\leq \mu +3\sigma )&\approx 99.73\%\end{aligned}}$

The usefulness of this heuristic depends especially on the question under consideration and the manner in which the data have been collected; most particularly the heuristic depends on the data genuinely being normally distributed: Among the many bell-shaped distributions often seen in real-life data, the normal distribution has notoriously “thin tails” – an unusual concentration of probability near its center. If the datum $X$ is instead governed by one of the many similar-appearing and commonly encountered distributions that have “fatter tails” – with probability more spread-out – the significance would be lower for all three deviations from the mean.

In the empirical sciences, the so-called three-sigma rule of thumb (or 3 $σ$ rule) expresses a conventional heuristic that nearly all values are taken to lie within three standard deviations of the mean, and thus it is empirically useful to treat 99.7% probability as near certainty.^[3]

In the social sciences, a result may be considered statistically significant (clear enough to warrant closer examination) if its confidence level is of the order of a two-sigma effect (95%), while in particle physics, there is a convention of requiring statistical significance of a five-sigma effect (99.99994% confidence) to qualify as a discovery.^[4]

A weaker three-sigma rule can be derived from Chebyshev’s inequality, stating that even for non-normally distributed variables, at least 88.8% of cases should fall within properly calculated three-sigma intervals. For unimodal distributions, the probability of being within three-sigma is at least 95% by the Vysochanskij–Petunin inequality. There may be certain assumptions for a distribution that force this probability to be at least 98%.^[5]

Proof

We have that ${\begin{aligned}\Pr(\mu -n\sigma \leq X\leq \mu +n\sigma )=\int _{\mu -n\sigma }^{\mu +n\sigma }{\frac {1}{{\sqrt {2\pi }}\sigma }}e^{-{\frac {1}{2}}\left({\frac {x-\mu }{\sigma }}\right)^{2}}dx,\end{aligned}}$ doing the change of variable in terms of the standard score $z={\frac {x-\mu }{\sigma }}$ , we have ${\begin{aligned}{\frac {1}{\sqrt {2\pi }}}\int _{-n}^{n}e^{-{\frac {z^{2}}{2}}}dz\end{aligned}},$ and this integral is independent of $\mu$ and $\sigma$ . We only need to calculate each integral for the cases $n=1,2,3$ . ${\begin{aligned}\Pr(\mu -1\sigma \leq X\leq \mu +1\sigma )&={\frac {1}{\sqrt {2\pi }}}\int _{-1}^{1}e^{-{\frac {z^{2}}{2}}}dz\approx 0.6826894921\\\Pr(\mu -2\sigma \leq X\leq \mu +2\sigma )&={\frac {1}{\sqrt {2\pi }}}\int _{-2}^{2}e^{-{\frac {z^{2}}{2}}}dz\approx 0.9544997361\\\Pr(\mu -3\sigma \leq X\leq \mu +3\sigma )&={\frac {1}{\sqrt {2\pi }}}\int _{-3}^{3}e^{-{\frac {z^{2}}{2}}}dz\approx 0.9973002039.\end{aligned}}$

Cumulative distribution function

These numerical values “68%, 95%, 99.7%” come from the cumulative distribution function of the normal distribution.

The prediction interval for any standard score z corresponds numerically to $(1 - (1 - Φ μ, σ 2 (z)) \cdot 2)$ .

For example, $Φ (2) \approx 0.9772$ , or $Pr(X \leq μ + 2 σ) \approx 0.9772$ , corresponding to a prediction interval of $(1 - (1 - 0.97725)\cdot2) = 0.9545 = 95.45%$ . This is not a symmetrical interval – this is merely the probability that an observation is less than $μ + 2 σ$ . To compute the probability that an observation is within two standard deviations of the mean (small differences due to rounding): $\Pr(\mu -2\sigma \leq X\leq \mu +2\sigma )=\Phi (2)-\Phi (-2)\approx 0.9772-(1-0.9772)\approx 0.9545$

This is related to confidence interval as used in statistics: ${\bar {X}}\pm 2{\frac {\sigma }{\sqrt {n}}}$ is approximately a 95% confidence interval when ${\bar {X}}$ is the average of a sample of size $n$ .

Normality tests

The “68–95–99.7 rule” is often used to quickly get a rough probability estimate of something, given its standard deviation, if the population is assumed to be normal. It is also used as a simple test for outliers if the population is assumed normal, and as a normality test if the population is potentially not normal.

To pass from a sample to a number of standard deviations, one first computes the deviation, either the error or residual depending on whether one knows the population mean or only estimates it. The next step is standardizing (dividing by the population standard deviation), if the population parameters are known, or studentizing (dividing by an estimate of the standard deviation), if the parameters are unknown and only estimated.

To use as a test for outliers or a normality test, one computes the size of deviations in terms of standard deviations, and compares this to expected frequency. Given a sample set, one can compute the studentized residuals and compare these to the expected frequency: points that fall more than 3 standard deviations from the norm are likely outliers (unless the sample size is significantly large, by which point one expects a sample this extreme), and if there are many points more than 3 standard deviations from the norm, one likely has reason to question the assumed normality of the distribution. This holds ever more strongly for moves of 4 or more standard deviations.

One can compute more precisely, approximating the number of extreme moves of a given magnitude or greater by a Poisson distribution, but simply, if one has multiple 4 standard deviation moves in a sample of size 1,000, one has strong reason to consider these outliers or question the assumed normality of the distribution.

For example, a 6σ event corresponds to a chance of about two parts per billion. For illustration, if events are taken to occur daily, this would correspond to an event expected every 1.4 million years. This gives a simple normality test: if one witnesses a 6σ in daily data and significantly fewer than 1 million years have passed, then a normal distribution most likely does not provide a good model for the magnitude or frequency of large deviations in this respect.

Black Monday—October 19, 1987—was an extreme tail event in global financial markets, marked by the Dow Jones Industrial Average falling 22.6% in a single day, the largest one‑day percentage drop in its history. This dramatic decline reflected a rare confluence of factors, including overvaluation concerns, negative macroeconomic news, and the amplifying effects of computerized portfolio‑insurance trading strategies. The event was severe, sudden, and globally contagious, but it does not fit within the assumptions of a normal (Gaussian) distribution, which is why traditional statistical models fail to describe it meaningfully. Some commentators have described Black Monday as a “36‑standard‑deviation event,” but this characterization is mathematically and conceptually flawed: 1) A 36σ event under a normal distribution is effectively impossible—its probability is so small that it would not be expected to occur even once in the lifetime of the universe. 2) The claim arises from misapplying the normal distribution to financial returns, which are well‑known to exhibit fat tails, volatility clustering, and structural breaks—features incompatible with Gaussian assumptions. 3) Because markets do not follow a normal distribution, calculating sigma‑equivalents for extreme events produces nonsensical results that exaggerate the improbability rather than explain the phenomenon. More appropriate models are those that incorporate fat tails, volatility clustering, and discontinuous jumps, such as the Student‑t distribution, Lévy‑stable distributions, GARCH‑type volatility models, and Extreme Value Theory (EVT) for tail behavior. These frameworks acknowledge that large price movements occur far more frequently than Gaussian models predict and that market volatility can shift abruptly during stress periods. As a result, they provide a more realistic foundation for understanding extreme events like Black Monday, without resorting to misleading notions such as “36‑sigma” outcomes.

Table of numerical values

Because of the exponentially decreasing tails of the normal distribution, odds of higher deviations decrease very quickly. From the rules for normally distributed data for a daily event:

Range	Expected fraction of population		Approx. expected frequency outside range	Approx. frequency outside range for daily event
Range	inside range	outside range	Approx. expected frequency outside range	Approx. frequency outside range for daily event
$μ \pm 0.5 σ$	0.382924922548026	0.6171 = 61.71 %	3 in 5	Four or five times a week
$μ \pm σ$	0.682689492137086^[6]	0.3173 = 31.73 %	1 in 3	Twice or thrice a week
$μ \pm 1.5 σ$	0.866385597462284	0.1336 = 13.36 %	2 in 15	Weekly
$μ \pm 2 σ$	0.954499736103642^[7]	0.04550 = 4.550 %	1 in 22	Every three weeks
$μ \pm 2.5 σ$	0.987580669348448	0.01242 = 1.242 %	1 in 81	Quarterly
$μ \pm 3 σ$	0.997300203936740^[8]	0.002700 = 0.270 % = 2.700 ‰	1 in 370	Yearly
$μ \pm 3.5 σ$	0.999534741841929	0.0004653 = 0.04653 % = 465.3 ppm	1 in 2149	Every 6 years
$μ \pm 4 σ$	0.999936657516334	6.334×10⁻⁵ = 63.34 ppm	1 in 15787	Every 43 years (twice in a lifetime)
$μ \pm 4.5 σ$	0.999993204653751	6.795×10⁻⁶ = 6.795 ppm	1 in 147160	Every 403 years (once in the modern era)
$μ \pm 5 σ$	0.999999426696856	5.733×10⁻⁷ = 0.5733 ppm = 573.3 ppb	1 in 1744278	Every 4776 years (once in recorded history)
$μ \pm 5.5 σ$	0.999999962020875	3.798×10⁻⁸ = 37.98 ppb	1 in 26330254	Every 72090 years (thrice in history of modern humankind)
$μ \pm 6 σ$	0.999999998026825	1.973×10⁻⁹ = 1.973 ppb	1 in 506797346	Every 1.38 million years (twice in history of humankind)
$μ \pm 6.5 σ$	0.999999999919680	8.032×10⁻¹¹ = 0.08032 ppb = 80.32 ppt	1 in 12450197393	Every 34 million years (twice since the extinction of dinosaurs)
$μ \pm 7 σ$	0.999999999997440	2.560×10⁻¹² = 2.560 ppt	1 in 390682215445	Every 1.07 billion years (four occurrences in history of Earth)
$μ \pm 7.5 σ$	0.999999999999936	6.382×10⁻¹⁴ = 63.82 ppq	1 in 15669601204101	Once every 43 billion years (never in the history of the Universe, twice in the future of the Local Group before its merger)
$μ \pm 8 σ$	0.999999999999999	1.244×10⁻¹⁵ = 1.244 ppq	1 in 803734397655348	Once every 2.2 trillion years (never in the history of the Universe, once during the life of a red dwarf)
$μ \pm xσ$	$\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)$	$1-\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)$	1 in ${\frac {1}{1-\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)}}$	Every ${\frac {1}{1-\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)}}$ days

References

^ Peter Westfall; Kevin S. S. Henning (9 April 2013). “Chapter 9 Functions of Random Variables: Their Distributions and Expected Values”. Understanding Advanced Statistical Methods. Texas Tech University: CRC Press. p. 243. ISBN 9781466512115. Some statistical sources call the 68–95–99.7 rule the empirical rule since empirical (or observed) data sets often follow these percentages. However, only data sets that look as if produced by a normal distribution will obey these percentages, so it’s safer to call this rule the 68–95–99.7 rule for a normal distribution
^ Huber, Franz (2018). “Chapter 5 Probability 5.1 THE PROBABILITY CALCULUS“. A Logical Introduction to Probability and Induction. New York, NY: Oxford University Press. p. 80. ISBN 9780190845414 – via Google.
^
This use of the phrase “three-sigma rule” became common in the 2000s, e.g. cited in
- Schaum’s Outline of Business Statistics. McGraw Hill Professional. 2003. p. 359. ISBN 9780071398763
- Grafarend, Erik W. (2006). Linear and Nonlinear Models: Fixed effects, random effects, and mixed models. Walter de Gruyter. p. 553. ISBN 9783110162165.
^ Lyons, Louis (7 October 2013). “Discovering the sigificance of $5 σ$ “. arXiv:1310.1284 [physics.data-an].
^
See:
- Wheeler, D.J.; Chambers, D.S. (1992). Understanding Statistical Process Control. SPC Press. ISBN 9780945320135 – via Google.
- Czitrom, Veronica; Spagon, Patrick D. (1997). Statistical Case Studies for Industrial Process Improvement. SIAM. p. 342. ISBN 9780898713947 – via Google.
- Pukelsheim, F. (1994). “The Three Sigma rule”. American Statistician. 48 (2): 88–91. doi:10.2307/2684253. JSTOR 2684253.
^ Sloane, N. J. A. (ed.). “Sequence A178647”. The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.
^ Sloane, N. J. A. (ed.). “Sequence A110894”. The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.
^ Sloane, N. J. A. (ed.). “Sequence A270712”. The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.

External links

“Calculate percentage proportion within $x$ sigmas“. WolframAlpha.

[rdp-we-cite_note-1] Peter Westfall; Kevin S. S. Henning (9 April 2013). “Chapter 9 Functions of Random Variables: Their Distributions and Expected Values”. Understanding Advanced Statistical Methods. Texas Tech University: CRC Press. p. 243. ISBN 9781466512115. Some statistical sources call the 68–95–99.7 rule the empirical rule since empirical (or observed) data sets often follow these percentages. However, only data sets that look as if produced by a normal distribution will obey these percentages, so it’s safer to call this rule the 68–95–99.7 rule for a normal distribution

[rdp-we-cite_note-2] Huber, Franz (2018). “Chapter 5 Probability 5.1 THE PROBABILITY CALCULUS“. A Logical Introduction to Probability and Induction. New York, NY: Oxford University Press. p. 80. ISBN 9780190845414 – via Google.

[rdp-we-cite_note-3] This use of the phrase “three-sigma rule” became common in the 2000s, e.g. cited in
Schaum’s Outline of Business Statistics. McGraw Hill Professional. 2003. p. 359. ISBN 9780071398763

Grafarend, Erik W. (2006). Linear and Nonlinear Models: Fixed effects, random effects, and mixed models. Walter de Gruyter. p. 553. ISBN 9783110162165.

[4] Schaum’s Outline of Business Statistics. McGraw Hill Professional. 2003. p. 359. ISBN 9780071398763

[5] Grafarend, Erik W. (2006). Linear and Nonlinear Models: Fixed effects, random effects, and mixed models. Walter de Gruyter. p. 553. ISBN 9783110162165.

[rdp-we-cite_note-4] Lyons, Louis (7 October 2013). “Discovering the sigificance of $5 σ$ “. arXiv:1310.1284 [physics.data-an].

[rdp-we-cite_note-5] See:
Wheeler, D.J.; Chambers, D.S. (1992). Understanding Statistical Process Control. SPC Press. ISBN 9780945320135 – via Google.

Czitrom, Veronica; Spagon, Patrick D. (1997). Statistical Case Studies for Industrial Process Improvement. SIAM. p. 342. ISBN 9780898713947 – via Google.

Pukelsheim, F. (1994). “The Three Sigma rule”. American Statistician. 48 (2): 88–91. doi:10.2307/2684253. JSTOR 2684253.

[8] Wheeler, D.J.; Chambers, D.S. (1992). Understanding Statistical Process Control. SPC Press. ISBN 9780945320135 – via Google.

[9] Czitrom, Veronica; Spagon, Patrick D. (1997). Statistical Case Studies for Industrial Process Improvement. SIAM. p. 342. ISBN 9780898713947 – via Google.

[10] Pukelsheim, F. (1994). “The Three Sigma rule”. American Statistician. 48 (2): 88–91. doi:10.2307/2684253. JSTOR 2684253.

[rdp-we-cite_note-6] Sloane, N. J. A. (ed.). “Sequence A178647”. The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.

[rdp-we-cite_note-7] Sloane, N. J. A. (ed.). “Sequence A110894”. The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.

[rdp-we-cite_note-8] Sloane, N. J. A. (ed.). “Sequence A270712”. The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]