If you are aspiring towards data science career, three points below will bring a smile on your face, for sure:
a)ย Harvard Business review has declared data scientist job as theย sexiest job of the 21st century.
b) IBM predicts demand for data scientists willย soar 28% by 2020.
c)ย Glassdoor lists data scientist job as theย #1 jobย in the U.S, with a median salary of around $1,08,000, and a satisfaction rate of 4.3 out of 5.
Data science career potential undoubtedly is promising.
In the same context, I would like to ask an important question. What do you think, apart from programming (Python, R, or SAS), what skill will you need the most to become a data scientist?
Yes! You got it right, itโs math and statistics. (Donโt worry if you suck at math. There are absolutelyย creative tutorialsย available to upskill you in math).ย Fundamentally, Mathematics is the base of all contemporary discipline of science and data science is not an exception too. Almost all the techniques of modern data science, includingย machine learning, carry some deep mathematical and statistical concepts as their supporting structure.
But, what does it mean when we say mathematics and statistics are essential skills to pursue a career in data science or AI? Should our youth, to prepare for a data-driven career, be spending their days deep in the fundamentals of a probability distribution, regression, and differential calculus?
No, itโs not like that but you need to have a basic understanding of underlying principles as well as statistics theorems, useful in creating data science and machine learning models.
In this blog post, we are going to discuss three data science theorems, in-depth with an example, every programmer should know to derive accurate result in an AI system.
Alright. This is going to be super interesting and fun!
1.ย ย Bayes Theorem:ย In front of the incredible powers of machine learning, we have become unfaithful to statistics, isnโt it true? But, can you imagine a career in the field of data science and ML, without prior knowledge of statistics especiallyย probability theory.ย
image credit:ย https://www.analyticsvidhya.com
That is why I choose to discuss Bayes theorem in this blog post. So letโs get started.
Think of it and there is no chance that you never heard of this theorem before. Bayesโ theorem is the most important rules of probability theory hence also found its way in AI & machine learning, to form one of the highly decorated ML algorithms, named Naรฏve Bayes algorithm.
This theorem provides us the way to examine the probability of an event based on the prior knowledge of any event, related to the former one. Following equation gives the basic representation of the Bayesโ theorem, considering A and B are two related events:
ย ย ย ย ย ย ย ย ย ย ย ย ย Image credit: Wikipedia
Here,
P(A|B): the conditional probability or the probability of event A to occur given that B occurred. This is also called the posterior probability.
P(B|A):ย the probability of event B to occur given that A occurred.
P(A), P(B): the probability of event A or B to occur. It is also called prior probability.
Letโs take a simple example to get more insight into Bayesโ theorem.
Suppose you asked to pick a single card from a deck of playing cards. The probability that the card is a Jack is 4/52 as there are 4 jacks in a deck of 52 playing cards. In other words, we can say that the prior probability P(Jack) = 4/52 = 1/13
But, what if the evidence is provided, say someone looks at the card, that the picked card is a face card. In this case, the posterior probability i.e. P(Jack |Face) can be calculated using Bayesโ theorem:
P(Jack | Face) =ย (P(Face | Jack))/(P(Face))ย * P(Jack)
P(Face | Jack) will be 1 because every jack is also a face card.
P(Face | Jack), the probability of a face card will be 3/13 because there are three face cards(Jack, Queen, and King) in each suit.
From these values, the likelihood ratio i.e. (P(Face | Jack))/(P(Face)) will be 13/3.
Now, after putting all the values, Bayesโ theorem gives P(Jack | Face) = 1/3.
Still wondering how Bayesโ theorem suits well to the purpose of Machine Learning?ย Well, letโs take the simplest ML model, where we need to make our model learn from a given set of attributes and then form a hypothesis to a response variable. Further, we use this hypothesis to predict a response, given a new set of a new instance. Here, Bayesโs theorem makes this possible for machine learning.
Moreover, if we talk about applications of Bayesโ Theorem then it is the base ofย spam filtering.
2.ย ย Central Limit Theorem:ย Although Abraham de Moivre, a French-born mathematician, suggested central limit theorem (CLT) several centuries back, it continues to be applied to a great extent, especially in data science and machine learning algorithms.
Nuts & Bolts of Central Limit Theorem:
Before diving into its formal definition, letโs understand CLT and its working, with the help of an example.
Suppose, the total number of students in a school is 2500 and your task is to calculate the average height of all the students. How you can do this?
The most obvious approach we get from aspiring data scientist is to simply calculate the average:
a) First, measure the height of all, 2500 students.
b) Add the height.
c) Finally, divide the total sum of heights with the total number of students and all done, we will get the average.
Donโt you think measuring the height of all the students is going to be a very tiresome and long process? So, is there any alternate approach? Yes, letโs have a look:
a) First, draw the random group of students from school and call this a sample. Draw multiple samples, each consisting of 30 or more students.
Image credit:ย https://research-methodology.net
b) Now, calculate the individual mean of all these samples.
c) Next, calculate the mean of these sample means.
d) The value we got here is the approximate mean height of the students in the school.
e) Graphically, the sample mean height of students will be a bell-shaped curve i.e. normal distribution.
To make a long story short, this is what the Central Limit Theorem is all about. Interesting? So, letโs go further and put a formal definition to CLT:
The Central Limit Theorem, a key concept in probability, states that with large sample size, the sampling distribution of the samples means approaches a normal distribution — no matter what the shape of the original population distribution.
Itโ simply saying, as you take more samples, especially large ones, the graph of the sample means will take a bell-shaped curve i.e. to look like aย normal distributionย as shown below:
Image credit:ย https://www.thoughtco.com
Above fact holds especially true for the sample sizes over 30.
Mathematically, we can define CLT with the help of following formula:
ยตx=ยต
And
ฯx=ย ฯ/โn
Where,
ยต = Population means
ฯ = Population standard deviation
ยตx = Sample mean
ฯx = Sample standard deviation
n = Sample size
The most important implication of CLT in machine learning is to inform the solution to linear algorithms such as linear regression.
3.ย ย No-Free-Lunch (NFL) Theorem:ย Can we have a machine learning algorithm that works well with any kind of data? The answer is No, we canโt. The reason behind this is the theorem calledย No-Free-Lunch theorem.
It says, there is no one ML algorithm that works best for every problem. That is why in machine learning, we try multiple models and choose one that works best for a problem.
Unfortunately, there is no such thing like free lunch!
Image Credit:ย https://www.thedailymeal.com
We can get more insight into no-free-lunch theorem with the help of following the simple example. Suppose we are asked to predict the next number from the sequence below:
A = 1,3,9,โฆ
By assuming, the sequence at each time step is being generated by At = 3At-1,ย most of us would probably predict 27 at the next place in the sequence.
On the other hand, there is no such reason to not believe the hypothesis that this sequence is simply the output of a random number generator. And if you think it as the other way, we cannot disapprove this hypothesis without seeing all the data points. Agreed?
Thatโs all for now, readers! Hope this blog post proves to be insightful for you! We would love to hear your thoughts too. Donโt hesitate to leave your comments in the section below. If you would like a career inย Data Science, then our comprehensive course with live sessions, assessments, and placement assistant might be your best bet
Fill in the details to know more
What Are SOC and NOC In Cyber Security? What’s the Difference?
February 27, 2023
Fundamentals of Confidence Interval in Statistics!
February 26, 2023
A Brief Introduction to Cyber Security Analytics
Cyber Safe Behaviour In Banking Systems
February 17, 2023
Everything Best Of Analytics for 2023: 7 Must Read Articles!
December 26, 2022
Best of 2022: 5 Most Popular Cybersecurity Blogs Of The Year
December 22, 2022
From The Eyes Of Emerging Technologies: IPL Through The Ages
April 29, 2023
Data Visualization Best Practices
March 23, 2023
What Are Distribution Plots in Python?
March 20, 2023
What Are DDL Commands in SQL?
March 10, 2023
Best TCS Data Analyst Interview Questions and Answers for 2023
March 7, 2023
Best Data Science Companies for Data Scientists !
Add your details:
By proceeding, you agree to our privacy policy and also agree to receive information from UNext through WhatsApp & other means of communication.
Upgrade your inbox with our curated newletters once every month. We appreciate your support and will make sure to keep your subscription worthwhile