The Curse of the Bell Curve – Part 1

9 Jan 2013

This article was originally published in Data Science Central

The ‘Bell curve’ or the ‘Gaussian bell curve’ is one of the fundamental concepts on which most of the statistical analysis is based. From social sciences to astronomy to financial services- most of the application of statistics in the real world relies on the assumption that the data being analysed is distributed in the shape of the bell curve.

What does the bell curve mean?

The bell curve, named after Herr Professor Doktor Gauss, is a beautiful visual that depicts how data from any normal distribution would behave.

In simplest terms, the Gaussian bell curve reveals that in a normal distribution most observations hover around the medium or the mediocre, i.e. the average. And the odds of deviation from this average (or chances of a value being different from the medium)decline at an increasingly faster rate (exponentially) as we move away from the average.

Let us take a simple example* to understand this feature of a normal distribution.

(* – This example has been taken from the book ‘The Black Swan’ by Nicholas Nassim Taleb. As a matter of fact, this whole article is inspired by Dr. Taleb’s brilliant writings on this subject.)

Assume that the average height of humans is 167 cm or 5 feet, 7 inches. Also assume that the unit of deviation (generally taken as the standard deviation of the population) is 10 cm.

Now as per the rules of the bell curve or the feature of the normal distribution, if one were to look at a (large enough) randomly chosen sub-population of humans, one would find most people to be of a height close to the average ie. 167 cm.

Put another way, more people are likely to be of height 168 cm (1 cm away from the average) than say 178 cm (11 cm away from the average).

And the odds of finding someone much taller (or shorter) than the average decrease at a faster and faster rate.

The odds of finding someone more than 10 cm taller than the average i.e. taller than 177 cm is 1 in 6.3.

The odds of finding someone more than 20 cm taller than the average i.e. taller than 187 cm or 6 ft 2 inches is 1 in 44.

The odds of finding someone more than 60 cm taller than the average i.e. taller than 227 cm or 7 ft 5 inches is 1 in a billion.

The odds of finding someone more than 70 cm taller than the average i.e. taller than 237 cm is 1 in 780 billion.

The main point to understand here is the pace at which the odds decline as we look for more and more abnormal or unusual observations. For the 10 cm increase in height from 177 cm to 187 cm, the odds change from 1 in 6.3 to 1 in 44. But for a 10 cm increase in height from 227 cm to 237 cm the odds change from 1 in a billion to 1 in 780 billion.

This is the essential property of a bell curve. The odds of finding larger and larger observations become so small that the outliers or unusual occurrences become a very, very remote possibility and hence can be ignored for all practical purposes.

This is the boon of the bell curve. It allows us to focus on the mediocre or the ordinary, and ignore the rare or the barely possible.

This is why statisticians, academicians, analysts and all sorts of people love the bell curve. It allows them to focus on the usual, the frequent or the norm.

Statistical models, from simple regression models to complex ones like the Black Scholes model in finance, are based on this property of the bell curve.

It is this property that allows us to say that it is highly improbable to see someone who is over 8 feet tall. Or make even more precise predictions such as – 68% of a large randomly selected population is going to be within 157 to 177 cm in height. And many such declarations that you regularly encounter in daily life – from medical test results to exit polls.

So far we have talked about how the bell curve is a boon for statistical analysis – it helps us simplify things and use rules to understand distributions. The curve’s symmetry and consistency make it ideal for making predictions. This is why it is such an important concept in business analytics.

In the next article we will actually address the main topic i.e. how these same qualities of the bell curve that make it so tempting and useful are also a curse. We will understand how uninformed application of the bell curve can lead to serious errors and can cause more harm than good. We will also see how mis-use of the bell curve is a lot more rampant than we think.

Interested in learning about other Analytics and Big Data tools and techniques? Click on our course links and explore more.

Jigsaw’s Data Science with SAS Course – click here.

Jigsaw’s Data Science with R Course – click here.

Jigsaw’s Big Data Course – click here.