Data Science is a field which has grown leaps and bounds in the past decade or so. The rapid growth, much like in any other field, has led to the birth of a few myths. One of the most striking ones is an often-repeated statement/idea that understanding of statistics is not mandatory for understanding data science.Â
I interact with aspirants regularly and have found that many among them have somehow got this idea. I have tried to understand the reasons at the basic level and here’s what I could think of:
There can be other reasons as well but these, in my opinion, broadly, takes care of the majority in question.
Let’s try to underline how knowing the subject can make an aspirant confident and help him/her become a ‘data scientist’.
When aspirants learn the basics of data science, they often encounter predictive models at an early stage. The presence of sophisticated tools have made life easier for us and, with knowledge of the underlying assumptions and how to check them, it is usually straightforward to build these models. And thus, it is not necessary to even know the null hypothesis one tests while building a simple regression model.
Until things work well.
And unless issues crop up.
Understanding of the theory and framework becomes useful when things don’t go as per plan. A bad multicollinearity or heteroscedasticity problem can be dealt with most effectively if it is known why it happened. To know this ‘why’, it is important to understand the statistics involved.
The other advantage is that it can help separate the excellent data scientists from merely the good ones. The knowledge of the subject can be used to understand if things are going fine or not. Else, it might well happen that after spending hours, in the end, one realises that the effort has proven to be futile.
Machine Learning and Deep Learning are attractive terms but there is a hierarchy which must be followed. If the basics of statistics and predictive modelling is not done properly, it is usually difficult to comprehend the advanced topics.
Building a second and third floor on a fragile ground floor is never a great idea…..
Fill in the details to know more
From The Eyes Of Emerging Technologies: IPL Through The Ages
April 29, 2023
Data Visualization Best Practices
March 23, 2023
What Are Distribution Plots in Python?
March 20, 2023
What Are DDL Commands in SQL?
March 10, 2023
Best TCS Data Analyst Interview Questions and Answers for 2023
March 7, 2023
Best Data Science Companies for Data Scientists !
February 26, 2023