Let us look into Building Random Forest Algorithm Models In Python. Random Forest is a supervised, flexible, and easy to use learning algorithm based on Ensemble Learning. Ensemble Learning is a method in Machine Learning that joins different or the same algorithms multiple times to form a powerful prediction model. This combination is known as Multiple Decision Trees. Random Forest develops Decision Trees on randomly selected data samples, gains prediction from each tree, and chooses the best solution by voting. The more trees a Random Forest algorithm has, the more robust the Forest is. Random Forest algorithms are used for both Classification and Regression.
In this article, we will see:
Random Forest algorithm works in the following way:
Like any other algorithm, Random Forest algorithms also have a few drawbacks.
For example, consider predicting whether a bank currency note is authentic or not based on four attributes. These attributes are the variance of the image wavelet transformed image, skewness, entropy, and the image kurtosis.
The task here is a binary classification problem, and a Random Forest classifier in Python solves this problem. The steps to solve this problem are as follows:
The dataset for this task can be downloaded from the following link:
Import the dataset using the following code.
For getting a high-level view of the dataset, execute the following command:
Use the following code to divide data into attributes and labels:
Use the following code to divide data into training and testing sets:
Use the following code for feature scaling:
After we have scaled the dataset, we will train our Random Forests to solve this classification problem. To do so, execute the following code.
For solving a classification problem using the Random Forest classifier in Python the metrics used to evaluate an algorithm are accuracy, confusion matrix, precision-recall, and F1 values. Use the following script to find these values:
The output will look like this:
The accuracy achieved by the Random Forest classifier with 20 trees is 98.90% and it is considered good. For the Random Forest classifier with Python, changing the number of estimators for the problem didn’t significantly improve the results. It is represented in the following chart, where the X-axis contains the number of estimators while the Y-axis shows the accuracy.
In this article, we have demonstrated how a Random Forest in Python is built, how it works, its advantages, and its disadvantages. Random Forests have various applications, such as Recommendation Engines, Image Classification, and Feature Selection.
If you wish to learn more about Data Science tools and Machine Learning Algorithms, take a look at our 11-months in-person Postgraduate Diploma In Data Science (PGD-DS). You can know more about this placement guaranteed program by clicking here.