Bayes Theorem In Machine Learning: An Important Guide (2021)


We live in the 21st century, a world driven by gadgets and technology. There are some fully established technologies and some that are still emerging.

Machine Learning is one of those technologies that are yet to be used to its full potential. With several elements making it better, one of the most important of them is the Bayes Theorem. But before understanding the use of the Bayes Theorem in Machine Learning, it is essential to understand what exactly Bayes Theorem is and how it works.

Table of Contents

  1. What is Bayes Theorem?
  2. Why is Bayes Theorem Used in Machine Learning?
  3. How is Bayes Theorem Applied in Machine Learning?
  4. Is it Useful to Use Bayes Theorem in Machine Learning?
  5. Examples of Bayes Theorem in Making Better Decisions
  6. Categories of Machine Learning Problems

Most people think that the Bayes Theorem is extensively used in the world’s financial sectors. This is not the actual case. Many medical sector channels have been using this theorem with ML to accurately determine the results of any test. Keeping in mind a disease, doctors, and scientists use this method to survey or determine how many or how quickly people might catch a new disease. Similarly, the aeronautical sector has also been using this theorem in more than one way to determine the test results.

1. What is Bayes Theorem?

To understand Bayes Theorem in Machine Learning, it is essential to understand that Bayes Theorem is very helpful to estimate the precision of values. To know how, let’s start this from the very beginning. Bayes Theorem is a statement and theorem given by an 18th-century mathematician from Britain. The formula he deduced is effective and being used in conditional probability. Now, what exactly is conditional probability?

It is a term used for the likelihood of deriving an answer to a question or the probable outcome on the basis of its previous outcomes. A single statement is a method or process of cross-checking or revising the existing predictions to eliminate the chances and possibilities of making mistakes. It is how we state the Bayes Theorem in Machine Learning.

To explain this with an example, we can consider a drug that has been tested to be 98% accurate and effective. It means, if a person infected with some disease uses that drug, it is 98% times that the person gets fine. After this, let’s assume only about 0.5% of people know about this drug and use it in need. Therefore, if at random a person is picked up and he/ she is tested positive with this drug use, then according to the Bayes Theorem –

(0.98 x 0.005) / [(0.98 x 0.005) + ((1 – 0.98) x (1 – 0.005))] = 0.0049 / (0.0049 + 0.0199) = 19.76%

Now, if you have studied and are acquainted with Machine Learning, it uses Artificial Intelligence in many things and processes to predict answers and possibilities. Therefore, owing to the great work of Thomas Bayes, the formulas and decision theory enlightened by him are used in Machine Learning to make the decision-making capability of this technology better and precise. Using this theorem has turned out to be a wise decision as it is accurate, effective, and simple. There are several applications where classification tasks are managed, and all of them use Bayes Theorem. So this is how “what is Bayes Theorem in Machine Learning” can be defined in the best way.

Bayes Formula:

P (A/B) = {P. (A⋂B) / P. (B)} = { P (A) . P (B/A) / P (B)}

In this formula, according to Bayes Rule in Machine Learning –

P(A) denotes the probability of A event occurring

P(B) denotes the probability of B event occurring

P(A/B) denotes the probability of A by B

P (B/A) denotes the probability of B by A

P (A⋂B) denotes the probability of both A and B events occurring

2. Why is Bayes Theorem Used in Machine Learning?

There are many cases where precise answers and numbers are required to make a decision, especially in the financial world. It is the time when technology comes in handy to make the right decision. Machine Learning is one of the technologies that help make the right decision at such times, and the Bayes Theorem helps make those conditional probability decisions better.

These events have occurred, and the decision then predicted acts as a cross-checking answer. It helps immensely in getting a more accurate result. Therefore whenever there is a conditional probability problem, the Bayes Theorem in Machine Learning is used. The direct conclusion of this process is that the more data you have, the more accurate the result can be determined.

Thus it makes conditional probability a must to determine or predict more accurate chances of an event from happening in Machine Learning.

3. How is Bayes Theorem Applied in Machine Learning?

Naive Bayes classifier is a simpler version of Bayes Theorem, and it is used as a classification algorithm to segregate data according to accuracy, classes, and speed. Let’s understand the usage of this theorem in Machine Language with the help of an example.

Suppose there is a vector A that has x attributes. It means A= A1, A2, A3………. Ax.

Now, suppose there are n classes represented as C1, C2, C3………… Cn.

These are two conditions given to us, and our classifier that works on Machine Language has to predict A. With x number of possibilities in Cn number of classes, there are limitless possibilities. To narrow down the answer or make it precise, we need to have a function or a method to help us with precision. Here is when Bayes Theorem comes into play and plays an important role.

The first thing that our classifier has to choose will be the best possible class. So, with the help of this theorem and the formula so derived, we can put the values in:

P(Ci/A)= [ P(A/Ci) * P(Ci)] / P(A)

Here is the entire explanation for this formula, how we get this, and how this will help in getting the best possible answer.

In this formula, P(A) is the condition-independent entity, which means it will be constant throughout the classes. It won’t change the value whenever the classes change. Therefore, to maximize P(Ci/A), which can also be termed as the precise answer to be derived, we will have to maximize the P(A/Ci) * P(Ci) value.

With n number classes on the probability list let’s assume that the possibility of any class being the right answer is equally likely. Considering this factor, we can say that –


Considering all the above things, we can conclude that the only thing we need to maximize or get precise is P(A/Ci). Now, if the data set were less in number, performing this process would have been easy. Keeping in mind the usage of this technology by MNCs and big firms, we can indeed say that the dataset would likely be extensive with numerous attributes.

Computationally performing this task will exhaust resources as well as time. Here is when terms independent of class conditions kick in and simplify the problem and bring the computation cost to a minimum. The independence of class conditions means that the value of the attributes would be independent of each other’s conditions.

This is how Bayes Theorem plays an important role in Machine Learning. The Naive Bayes Theorem in Machine Learning has simplified the conditional probability tasks without taking a hit on the precision.

Therefore we can now conclude that –

P(Ai/C)= P(A1/C)* P(A2/C)* P(A3/C)*……*P(An/C)

With the help of the combination of Bayes Theorem Machine Learning, it is easy to depict the possibilities of smaller events.

4. Is it Useful to Use Bayes Theorem in Machine Learning?

With all the claims, proofs, and depictions, it can be concluded that the use of the Bayes Theorem in Machine Learning is useful and rightful. All the algorithms given in Bayes Theorem in this technology can be compared to the other algorithms. But for a long time now, the Bayes method has been considered a reliable, simple, and precise method. Before using this theorem, the assumptions made should be independent of the class conditions across all the cases.

5. Examples of Bayes Theorem in Making Better Decisions

Whenever a test is run using the technology, certain steps are followed by a computer. Here are the basic four steps that help in determining the probability:

  • Accessing all the Data in the Library using ML Models

from sklearn.metrics import confusion_matrix

from sklearn.metrics import classification_report

from sklearn.model_selection import train_test_split

  • Segregating the Independent and Dependent Variables

X = iris.iloc[:, :-1].values

y = iris.iloc[:, -1].values

  • Splitting of Database into Test and Training Set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

  • Using the Classifier to Get the Output

# Naive Bayes

from sklearn.naive_bayes import GaussianNB

classifier = GaussianNB(), y_train)

y_pred = classifier.predict(X_test)

# Summary of the predictions made by the classifier

print(classification_report(y_test, y_pred))

print(confusion_matrix(y_test, y_pred))

# Accuracy score

from sklearn.metrics import accuracy_score

print(‘accuracy is’,accuracy_score(y_pred,y_test))

6. Categories of Machine Learning Problems

When it comes to conditional probability, Machine Learning problems can be divided into two categories: Regression and Classification.

Regression: This is the condition where we need the computer to calculate a value that is associated or related to some data.

Classification: This is the condition where we assign the data point to a category.

The above-given graph gives a good idea of how regression of the Bayes Theorem works. Consider a small and simple set of data representing the temperature of a village each day for a year on the x-axis. Similarly, on the y-axis, it determines the number of bottles sold by one of the village shops every day. This graph is a representation of both these considerations according to the Bayes Rule in Machine Learning.

It is one of the ways that helps in predicting a better way as after using the Bayes Theorem, one can find the number of bottles that a person sells on an average every month. The shopkeeper can also determine the months when most bottles are sold and work in keeping the stocks during those times. Also, the person can check whether these two quantities are relatable to each other or not.

The best way to use this theorem in regression is to estimate all the parameters according to the linear model. It helps in providing better reasoning as to how and why ML is dependent on the Bayes Theorem.

On the other hand, with the help of the Bayes optimal classifier, one can use the method for the classification process as well. Both these processes are followed in several big firms at a huge level.


Though we live in a fancy world that works on various technologies, with this article, it is apparent to us that these emerging technologies are incomplete without the use of already existing theorems and knowledge. Much like the importance of Bayes Theorem in Machine Learning, several other things drive these emerging technologies, such as Machine Learning, Artificial Intelligence, RPA, AR, VR, and others. Therefore, with all the facts and figures, we can conclude that ML is highly dependent on Bayes Theorem to get a precise answer or prediction for an event.

There are no right or wrong ways of learning AI and ML technologies – the more, the better! These valuable resources can be the starting point for your journey on how to learn Artificial Intelligence and Machine Learning. Do pursuing AI and ML interest you? If you want to step into the world of emerging tech, you can accelerate your career with this Machine Learning And AI Courses by Jigsaw Academy.


Related Articles

Please wait while your application is being created.
Request Callback