Using Confidence Intervals Confidently!!!

15 Dec 2014

By Gunnvant Singh- Jigsaw Academy Faculty

All of us have heard about confidence intervals, for some it is a part of their regular vocabulary while for many the very idea of confidence intervals is a little tricky to understand. In this post I will introduce the notion of confidence intervals.

Let us first see what a confidence interval means. Simply put, an X% confidence interval is a set of two numbers between which we are X% confident that our true population mean would lie. Hmm!!! Now that seems pedantic. Let us break things down here. Now it is common knowledge that if I take samples and compute the means out of my samples, the sample estimates would be a little different from the true population average. Most sample averages would lie near the population mean.

If I draw a frequency plot of the different sample averages I will get a plot which looks like this

As can be seen, most of the estimates of mean from samples lie close to the population mean (0). We get very few samples which are far away from mean (-0.25 and 0.25).

Let us assume that I take a sample from a population where mean is zero and the sample estimate of mean comes out to be 1. Now, since in real life I can only observe a sample and have to rely on sample estimates, I can do two things:

I can either accept that the computed sample mean is infact equal to population mean
I can ask this question, if my computed sample mean (1) is infact, the population mean, then what is the range of values that I can observe X% (here X can be 90%,95% or 99%) of times.

Clearly, second option is much better. If you compute the range of data points between which you expect X% of data to lie, assuming your population mean is equal to sample mean, then the ranges thus computed will contain the population mean X% of time!!!!

Below I present a table of 95% range of values for each sample mean estimate (assuming that population mean is 0, population standard deviation is 1 and sample size is 100)

As we can see that most of the times the true population mean (0) lies between the intervals computed for each sample mean estimate. So, when we quote a 95% confidence interval that essentially means that if we take many samples, compute confidence intervals for each of those samples, 95% of those confidence intervals will be such that the population average would lie between those limits.

So, next time you come across a statistician bragging about how certain he is with his confidence intervals, you can always point out that there is just X% chance that true population mean would lie in that interval!!!!

Interested in learning about other Analytics and Big Data tools and techniques? Click on our course links and explore more.