Central Limit Theorem, also known as CLT, is an important and often used concept in statistics. It is a fairly simple concept to understand and is a landmark discovery in the field of statistics. It forms the basis of probability distribution and has significant implications on the applied machine learning. CLT uses sampling distribution to generalize the samples and calculate approximate mean, standard deviation, and other parameters.
In this article let us look at:
Central Limit Theorem is a statistical concept that states that the sample means the distribution of a random variable will approach a normal or Gaussian distribution if the sample sizes are large irrespective of the shape of the original population distribution.
To understand it better, let’s define the terms:
The rule of the thumb or something which is considered safe is that the sample size should be greater than 30 (n>=30).
In simpler terms, CLT states that for 30 or more data points in your sample, the mean of that sample will be a part of a bell-shaped curve closer to the centre with few averages lying on either extreme and will represent the mean of the entire population. The same applies to standard deviation ‘σ’ as well. The average standard deviation of all the samples is representative of the standard deviation of the entire population.
One important assumption for this theorem to give correct statistical inference is to consider a sufficiently large random or unbiased sample from the population. Also, as we increase our sample size, it increases the estimate of the accuracy of the Gaussian distribution.
The Central Limit Theorem formula can be represented as:
For a population(n) if “X” has finite mean μ and sd σ, CLT is defined by,
For the mean of the sample mean
For standard deviation of the sample means
Where,
μ = Population mean
σ = Population standard deviation
μ = Sample mean
σ = Sample standard deviation
n = Sample size
To sum up, the Central Limit Theorem statistics states that for a large population n, X-bar can be approximated by a normal distribution with mean µ and standard deviation σ/√n.
We can understand the Central Limit Theorem with a worked example of rolling a dice. We have 1 to 6 numbers on each side of a dice cube. When we roll a dice multiple times, we expect to get an equal proportion of each roll. Each number has an equal probability of 1 in 6 to turn up from each roll. If we roll the dice multiple times, say 500, we will get more or less a linear or uniform distribution of the likelihood of any number appearing on the dice.
We increase the sample size and plot the averages of those samples to prove CLT. We roll the dice twice and repeat this process 500 times. Then compute the average of each pair and plot it on a graph. This process is to be repeated by rolling the same dice 5 times, 10 times, and the average of rolls to be plotted. The histogram of each set of averages shows that as the number of rolls increases (which means the sample size increases), the distribution of the averages comes close to Gaussian distribution. Also, the variation of the sample means decreases with an increase in sample size.
The implications of the Central Limit Theorem in the field of applied machine learning is significant. It is at the core of what machine learning does, make inferences about data. This theorem helps to quantify the likelihood of the sample getting deviated from the original population without considering a new sample. Also, an independent sample is a good representation of the complete population being observed, so the whole population’s attributes are not required.
The concept of significance testing and confidence interval is also based on CLT. By knowing that our sample mean will be a part of a normal or Gaussian distribution, we can use an understanding of the Gaussian distribution to estimate the probability of the sample mean based on the sample size and calculate an interval of desired confidence around the skill of the machine learning model.
To summarize, the article articulates the following:
There are no right or wrong ways of learning AI and ML technologies – the more, the better! These valuable resources can be the starting point for your journey on how to learn Artificial Intelligence and Machine Learning. Do pursuing AI and ML interest you? If you want to step into the world of emerging tech, you can accelerate your career with this Machine Learning And AI Courses by Jigsaw Academy.
Fill in the details to know more
From The Eyes Of Emerging Technologies: IPL Through The Ages
April 29, 2023
Personalized Teaching with AI: Revolutionizing Traditional Teaching Methods
April 28, 2023
Metaverse: The Virtual Universe and its impact on the World of Finance
April 13, 2023
Artificial Intelligence – Learning To Manage The Mind Created By The Human Mind!
March 22, 2023
Wake Up to the Importance of Sleep: Celebrating World Sleep Day!
March 18, 2023
Operations Management and AI: How Do They Work?
March 15, 2023
How Does BYOP(Bring Your Own Project) Help In Building Your Portfolio?
What Are the Ethics in Artificial Intelligence (AI)?
November 25, 2022
What is Epoch in Machine Learning?| UNext
November 24, 2022
The Impact Of Artificial Intelligence (AI) in Cloud Computing
November 18, 2022
Role of Artificial Intelligence and Machine Learning in Supply Chain Management
November 11, 2022
Best Python Libraries for Machine Learning in 2022
November 7, 2022
Add your details:
By proceeding, you agree to our privacy policy and also agree to receive information from UNext through WhatsApp & other means of communication.
Upgrade your inbox with our curated newletters once every month. We appreciate your support and will make sure to keep your subscription worthwhile