This article was originally published inย Data Science Central
The โBell curveโ or the โGaussian bell curveโ is one of the fundamental concepts on which most of the statistical analysis is based. From social sciences to astronomy to financial services- most of the application of statistics in the real world relies on the assumption that the data being analysed is distributed in the shape of the bell curve.
In the last article we discussed the usefulness of the Bell curve. It helps us simplify things and use rules to understand distributions. The curveโs symmetry and consistency make it ideal for making predictions.
In this article, we will discuss how these same qualities of the bell curve that make it so tempting and useful can also be a curse.
Does all information follow the Bell Curve?
There are many examples of normal (or approximately normal) distribution around us. The statistical concepts have been empirically tested and verified countless times.
Certain quantities inย physicsย are distributed normally such as the velocities of the molecules in anย ideal gas. Inย biology, theย logarithm of various variables such as the thickness of the tree bark or claws of a mammal tend to have a normal distribution. In Finance, changes in log of certain phenomenon such as exchange rates and price indices are assumed to be normal though this assumption is hotly contested by some. Bell curve gradingย assigns relative grades based on a normal distribution of scores.
As Dr. Taleb says in his book, The Black Swan, we can make good use of the Gaussian approach (i.e. the bell curve) for variables for which there is a rational reason for the largest not to be too far from the average. If there is gravity pulling down numbers, or if there are physical limitations preventing very large observations (say, the length of the tail of a cat), we end up in mediocristan.
Mediocristan is a term coined by Dr. Taleb to denote situations where the Gaussian approach (normal, binomial, poisson etc.) will work.
The Curse of the Bell Curve
The Curse of the Bell Curve, however comes from the fact that we often use the bell curve in situations that bear no resemblance to a normal distribution. Many real life phenomena do not follow the bell curve and yet we assume a normal distribution just because the simplicity of the bell curve is highly tempting. Let us examine some glaring examples here.
Most real life data does not exhibit normal distribution. A normal distribution is more of an exception than a rule. Real world data shows variations (high and low) that are far more frequent than what the bell curve predicts. Even data that seems to be normally distributed may seems so only because our observation period is not long enough.
This is an important lesson for any analyst dealing with real world data. Always check the data for normality. And always look for a rational explanation about why the data should be normal. Only if you are satisfied on both the counts, should you assume a normal distribution. And then also, proceed with caution.
The concept of The Bell Curve is a highly seductive one. Once it gets into your mind it is hard to get past it. Hence be careful about its use.
The bell curve has a lot of uses and it should not be discarded completely. But it should be used judiciously or the consequences can be disastrous.
Fill in the details to know more
Important Artificial Intelligence Tools
October 31, 2022
Top 28 Data Analytics Tools For Data Analysts | UNext
September 27, 2022
Stringi Package in R
May 5, 2022
Best Frameworks In Java You Should Know In 2021
May 5, 2021
Lean Management Tools: An Ultimate Overview For 2021
May 4, 2021
Talend ETL: An Interesting Guide In 4 Points
Add your details:
By proceeding, you agree to our privacy policy and also agree to receive information from UNext through WhatsApp & other means of communication.
Upgrade your inbox with our curated newletters once every month. We appreciate your support and will make sure to keep your subscription worthwhile