Statistics for Machine Learning: A Basic Overview In 2021


Statistics for machine learning is one of the foundational pillars in machine learning and a deep understanding of the subject is essential for mastery of application in machine learning. The two subjects are not mutually exclusive like many think and neither is machine learning a glorified form of statistical methods in the modern. Both of have commonalities and while one is recent advancement because of computer science and big data, the other is an age-old branch of mathematics. If you are aiming for a career in machine learning and its applications then statistics for machine learning is a foundational subject that needs to be learnt. 

  1. Why do we need Statistics? 
  2. Importance Of Statistics For Machine Learning? 
  3. Mistakes 

1. Why do we need Statistics? 

In essence, statistics is a branch of science and mathematics that is centred on the methods used for collecting, analyzing, interpreting and presenting empirical data. The discipline has been around for over a thousand years and mainly contains two methods: descriptive statistics and inferential statistics. Descriptive statistics uses the standard metrics used in statistics such as mean, mode, median, standard deviation and others to summarize information about data samples. This is used for exploratory data analysis sometimes when scaling projects. Inferential statistics use inferential projections of a larger population-based on data from studied samples. 

Machine learning practitioners with an understanding of statistical methods can ask relevant questions such as: 

  • What is the most common or expected observation from the data set?
  • Are there any observational limits that skew views?
  •  What does the data look like? 
  • What are the most relevant variables? 
  • Are there any differences between the two experiments? 
  • Are there real differences in data or is it caused by noise?

Answers to these questions are found through statistical techniques applied to machine learning. Having a thorough grasp of statistical methods can lead to effective decision making.

2. Importance Of Statistics For Machine Learning? 

Machine learning which a relatively new discipline is drawing many of its methods from statistics, some examples include logistic regression and linear regression. Some of the areas of application where probability and statistics for machine learning are used include problem framing, data understanding, data cleaning, data selection, data preparation, model evaluation, model configuration, model selection, model presentation, model predictions etc. In recent times data scientists and machine learning engineers who use packages such as scikit-learn in Python are often unaware of the underlying statistics that goes into it. 

3. Mistakes 

There are three main mistakes that beginners make even after learning the importance of statistical modelling techniques: 

  • They don’t know Statistics

Many programmers and developers who move into machine learning don’t know statistics and it can slow their growth. Statistics is not essential in programming or areas such as software development, therefore software engineering courses typically do not have statistics as a subject. This creates a trend among developers to not understand the value of statistics. This problem trickles down when they enter machine learning as there is a deep use of statistical thinking and statistical methods in the preparation of data, evaluation of created models and many more steps involved in predictive modelling tasks. 

  • They Study the Wrong Statistics

When machine learning practitioners do feel the need to fill their knowledge gap in statistics, they would often start with textbooks. They would pick books from their college or from undergraduate courses which covers a wide range of topics that are really unnecessary for their work at hand. The needs of machine learning practitioners might revolve around the better interpretation of descriptive statistics or visualization of data when they would work with more complex hypothesis tests. This means the statistical learning they need is quite different from what they may learn from undergrad textbooks. 

  • They Study Statistics the Wrong Way

When machine learning practitioners start learning from college course textbooks they are also referring to material that is not intended for them. Textbooks and course material have the goal of helping students only the basics and in just enough depth to pass their exams. However, these materials are terrible when it comes to helping practitioners learn the applicable methods they need. So they will need to go through methods that are clearly designed for giving appropriate instructions on how to interpret the results. 

  • A Better Way into Statistics 

For those with a full-time job, the basic textbook approach is difficult when it comes to learning statistics and many would eventually give up as they are not able to progress through the textbook. The way forward is to pick statistics for machine learning books written for machine learning professionals who need just the right concepts without the unnecessary topics thrown in their way. 


Statistics is an old branch of mathematics and machine learning is a newer field that emerged after computer science and big data. The two have a lot in common than most people realize and studying statistics is essential for machine learning practitioners. 

There are no right or wrong ways of learning AI and ML technologies – the more, the better! These valuable resources can be the starting point for your journey on how to learn Artificial Intelligence and Machine Learning. Do pursuing AI and ML interest you? If you want to step into the world of emerging tech, you can accelerate your career with this Machine Learning And AI Courses by Jigsaw Academy.


Related Articles

Please wait while your application is being created.
Request Callback