Machine Learning algorithms are increasingly being used in data science, and are able to generate better, more accurate results over more traditional statistical methods. There are two distinct schools of thought that have emerged in analytics – the statistical school that places a lot of importance in the model explainability, and the machine learning school that believes in accuracy over explainability.
The best way to explain this is to take a simple example – let’s say a alresponse model. Traditionally, the variables that made it to the final response model were as important as the accuracy of the response model, and if, for example, you see a result that says cat owners have a higher response rate to campaigns involving say credit cards, it would be difficult to explain that result, and very likely you may not include it in your final model.
In a machine learning model, it wouldn’t matter, because it is a pattern that has been identified in the data, and you don’t really look at the variables themselves, just the output.
That is also why in so many of the Kaggle competition datasets, you see attributes names X1-X100, with no idea on what those attributes are.
Given the direction in which business decision making is moving – faster, more real time, more micro targeted, the adoption of machine learning algorithms is going to keep increasing. The computing power that is required to implement these algorithms is becoming more and more achievable, and the fact that by definition machine learning requires less human intervention makes it easier to get results faster. Also, it is a lot easier to tackle problems related to unstructured and image data with ML algorithms – text analytics, image analytics, video analytics all use ML approaches.
Therefore, it is becoming imperative for all current and future data scientists to learn to use ML techniques, including those that subscribe to the more traditional statistical modelling approach.
This does not however mean that there is no value in the statistical approach, or in human intervention. At the end of the day, the role of a data scientist is to solve specific business problems using the most efficient and appropriate techniques, and they can do that by having a solid understanding of both the statistical and mathematical techniques that are the foundation of data science. There are some problems that are well suited for statistical approaches, and the additional value generated by applying ML techniques may be minimal. And there are other problems that are absolutely apt for applying more ML approaches, where the assumptions that statistical models required may not be met.
If you are currently in data science or are planning a career in analytics, you will need to have knowledge of all of these techniques, statistical and ML, because many companies still overwhelmingly use statistical models and simultaneously are exploring newer approaches to newer problems that use previously unavailable data and data types