Business analytics (BA) is a set of skills, technology, and methods used to study a company’s information and performance to obtain insights and make data-driven future choices. BA determines which datasets are useful and can increase revenue, productivity, and efficiency. Almost 33% of large-sized enterprises will use decision intelligence by 2023. Businesses benefit from data analytics by making decisions 5x faster.
Many types of datasets contain outliers and missing values. Outliers significantly affect the estimation process in statistics by causing underestimated or overestimated results. Missing values reduce the available data and may cause the results to be less reliable. Hence, learning how to handle missing data and deal with outliers is important. Business Analytics plays an important role in handling such issues. Business Analysts use certain methodologies that can be used to deal with outliers and missing values. Here’s how to deal with outliers and missing values in a given dataset.
An outlier is an observation in a random population sample that lies at an abnormal distance from other values. It is an observation that deviates substantially from the normal values in the population. The analyst decides what to consider abnormal. Before deciding that, normal observations are to be characterized. Outliers are found in all types of datasets.
For Example: Which of the following will have the lowest do value?
Consider a dataset consisting of values 1,500,550,560 and 575. The answer will be 1. 1 is an outlier since it differs greatly from other observations.
Three types of outliers are found in various types of datasets.
Global outliers are observations that deviate significantly from normal observations, and they lie farthest away from the normal data points. For example, the number of flights operating during the lockdown as compared to other times.
Some observations cannot be individually considered outliers, but collectively they deviate significantly from the rest of the data points. They are considered outliers when aggregated.
Contextual outliers deviate significantly from normal observations, but only when the observations are based on a specific context. For example- a temperature may be considered abnormally low during summer but will be normal during winter.
Outliers are abnormally different from normal observations, and their presence in a dataset may introduce bias in statistical estimates. There are many reasons why it is important to treat outliers.
Missing values occur in datasets when no value is recorded for a variable in an observation. Missing values can be present due to many reasons, such as non-response from the study participant, loss of data while recording, errors from the recorder, etc. Missing values are found in all types of datasets, and they may significantly affect the accuracy of the results. You may have wondered what happens when a dataset includes records with missing data. Missing values may cause bias in the results, lead to incorrect results, and make machine learning models fail. Hence it is essential to tackle missing values properly.
The first step towards dealing with a missing value is to find the missing value. There are several ways to detect missing values in Python. Mostly isnull() function is used to find the missing values. By using dataframe.isnull().values.any(), you can find the missing values in a dataset. Using dataframe.isnull().sum(), you can find the missing values in each column.
How to handle missing data and outliers in different types of datasets? Can they be deleted? Can they be replaced? Several strategies are used for handling missing values and outliers. You need to analyze the dataset properly to decide which strategy is best suited for you. Some of the commonly used strategies are-
This quick technique is used to eliminate missing values from a dataset. However, it is not a recommended strategy. You may lose important data by deleting the observations with missing values. This may lead to incorrect conclusions and misinterpretations.
Under this method, missing values are replaced with others before working on the data. Several imputation methods can be used.
Business Analytics deals with analyzing various types of datasets, resolving issues like missing values or outliers, and making informed decisions. If you find Business Analytics interesting, you can enroll in the Integrated Program in Business Analytics by UNext . It will enable you to learn data visualization, statistical modeling, and other advanced data analytics topics.