Analytics is the hot and happening Industry of 21st century. People are trying to make a mark in the industry without knowing the skills required to master. Most of us are confused amongst the job profiles of a data analyst, business analyst, data wrangler, data engineer, and data scientist and often end up thinking they are identical. In this blog, we will see who a data analyst is and what are the skills required to become a complete data analyst in a clear and concise manner.
To be precise, a data analyst is a person with the right blend of all the above-mentioned expertise. Data analysis involves a lot of roles and responsibilities and here, we’ve broken down them for better comprehension.
Diagnose Business Problems:
This is the first skill required for a data analyst. Businesses might propose a problem statement and the desired outcome. It’s the data analyst who should have the domain knowledge about the business and be able to convert the business problem into a statistical/mathematical problem (proposing hypotheses for it).
The next skill of a data analyst is the ability to identify all the data sources related to the problem statement. Data sources are text/csv files in most of the open source data/competitions. But the data sources in real world problems could be SQL databases, online data (api) and noSQL databases (like HBase, Cassandra, Mongo DB). As a data analyst, one should be capable of handling data from different sources.
This is one of the most important skills required for an analyst. Some challenges that an analyst faces could range from inconsistent data (data formatting, misspellings, and misfielded values) to missing values. Some standard techniques used are missing value treatment, outlier detection, data transformation (normalizing the data, binning).
Exploratory Data Analysis:
Exploratory Data Analysis (EDA) provides a simple way to obtain a bird’s eye view of the data. Simple plots like bar plot, box plot, histogram and statistical techniques like correlation, chi-squared test, and dimensionality reduction are used to get a big picture look at the data. It is a very important skill but many data analysts tend to skip/overlook this step.
This is the first skill a person aspires to master when they enter into the world of analytics. Business requirements play a major role in deciding whether to use simple explanatory models like linear, logistic regression, decision tree or more sophisticated models like random forest, gradient boosting, neural networks and SVM.
This skill goes hand in hand with model building. We might validate our model using techniques like Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), and Mean Absolute Error (MAE) if our target variable is numeric & continuous. Accuracy, Area under the Curve (AUC), F1-Score are used if our target variable is categorical.
Visualization of results:
This Skill helps us communicate the insights to the business counterpart using visually impactful representations such as charts and graphs. These skills determine whether our analysis is implemented by the business or if it remains as yet another document/presentation.
We will try to understand more about each of the skills in detail in our subsequent blogs.
This is a guest post by Vignesh who works as a senior data analyst at Customer Analytics LLC. He has previously worked as a data analyst at Infosys. He has more than 3 years of experience in the field of analytics and has worked across multiple verticals including retail, telecom, financial services, and manufacturing industries. He loves solving business problems using various statistical and Machine Learning techniques. He has also participated and won contests conducted by Premium Data Science Platforms like Kaggle, CrowdANALYTIX.