Naive Bayes is one of the most basic machine learning technique used for Classification.
Let’s look at how this algorithm works with an example. In the credit card industry, clients are always looking for methods to help classify customers as good or bad. A good customer is one who does not default on his payments as opposed to a bad customer. Let’s consider two such features (say maybe Income and credit limit of the person) that helps us distinguish between good and bad customers. Let’s plot a scatter plot between them that help us classify the customers.
In the following graph, the red data points represent bad customers and the blue point represents good customers. This you can consider as the training data.
How does the algorithm work?
The task of this algorithm is to classify new data points (in this case, new customers) as good or bad.
Firstly, it calculates the final probability of whether the customer is good or bad based on Prior Probability and Posterior likelihood of the classifier. Let’s look at what this means.
What is prior probability?
If you take a close look at the data points,
%good customers=2 (%bad customers)
So there is a very high probability that a customer who does not have a classification yet is twice as likely to be a “good” customer rather than a “bad” customer. In Bayesian analysis, this is called as the prior probability. So prior probability is nothing but the % of the classifier out of the total population.
So let’s formulate a table with the prior probabilities:
What is the likelihood of a new customer?
Objective: In the above graph, we see a new green point (which is a new customer) whom we have to classify as good or bad.
Since the customers have a good split between them (we are able to see a clear distinction between good and bad customers) , we can assume that more blue points are closer to this new customer than red points, and hence its more likely that the new class this customer belongs to will be “good”. To measure this likelihood, we draw a circle around the new point, which encompasses a number (to be chosen a priori) of points irrespective of their class labels. Then we calculate the number of points in the circle belonging to each class label. From this we calculate the likelihood:
What is Posterior Probability?
Using Bayes rule, we calculate a new probability called the posterior probability which is a combination of the prior probability and the likelihood. Then, we find out:
Finally whichever probability is higher, the customer is given that classification. Using this method, the Naïve Bayes algorithm classifies the new customer as good or bad.
Since the probability that the new customer is good is higher, the Naive Bayes will classify this data point as a new customer.
The above is the just a simple example of how the Naïve Bayes works. It can handle a huge number of independent variables whether categorical or continuous.You also have to remember that a naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. One of the major advantages of this algorithm is that it requires requires a small amount of training data to estimate the parameters necessary for classification.