An outlier is an observation that falls at an abnormal distance from the other values that are there on the random sample from the population. The decision, however, rests on the analyst to understand which point will be considered to be abnormal or an outlier. The articles talk about the types of outliers.
The outlier is a point of data that is different from the other observations. The outlier could be caused because of measurement variability. It could also be formed because of some experimental error. In the case of the latter, it gets altogether eliminated from the data set. An outlier is something that can affect the statistical analysis phenomenally.
The outlier values could be formed by chance in the distribution. In most cases, this could be because of a measurement error or if the population comes with a heavy-tailed distribution. In the former case, one will wish to discard them or use the test that is robust to the outlier. In the latter case, the analysis will show that the distribution is highly skewed and that one should be cautious when using the tools that assume a normal distribution.
Another cause of an outlier could be when two distributions are mixed. These could be two subpopulations that may be disconnected or it could indicate a trail that has an error of measurement. This gets modelled using the mixture model.
Outlier points could be faulty data or an error procedure. It could also be an area where some theory would be invalidated. If the sample size is large then some outlier is fine to have.
The outlier is an extreme observation and this could include the sample minimum or maximum or both of them. On the contrary, the sample minimum and maximum are not an outlier because these may not be far from the observation.
Now that you know what an outlier is here are the types of outliers.
A data point gets considered to be a global outlier in case its value is very far away from the entire data set in what it is found. The global outlier is basically a sample point that is measured and which has a very high or a low value relative to the values that are present in the dataset.
If a particular data point is different in the context that is specific to a condition but it is not different otherwise then this is called the contextual outlier. The data object attribute needs to get divided into two groups. The behavioural attributes are the object characteristics that are used in evaluating the outlier. It is difficult to spot the contextual outlier if you do not have any background information.
If there is a data point collection that is totally different from the entire set of data then this is the collective outlier. A subset of the data point in the data set is different if the values as a group deviate from the data set totally. However, the values of these data points are not different in a global or a contextual sense.
It is important to investigate outliers carefully. They may have some information about the process which is under investigation. Before you consider eliminating it you should first try to understand the reasons why they may have been here in the first place. In most cases, outliers could be a bad data point. Unfortunately, there is no strict statistical rule for outlier identification. This makes it highly subjective which is dependent on the analysts’ knowledge and the process of data collection.
If you are interested in making it big in the world of data and evolve as a Future Leader, you may consider our Integrated Program in Business Analytics, a 10-month online program, in collaboration with IIM Indore!
Fill in the details to know more
Understanding the Staffing Pyramid!
May 15, 2023
From The Eyes Of Emerging Technologies: IPL Through The Ages
April 29, 2023
Understanding HR Terminologies!
April 24, 2023
How Does HR Work in an Organization?
A Brief Overview: Measurement Maturity Model!
April 20, 2023
HR Analytics: Use Cases and Examples
10 Reasons Why Business Analytics Is Important In Digital Age
February 28, 2023
Fundamentals of Confidence Interval in Statistics!
February 26, 2023
Everything Best Of Analytics for 2023: 7 Must Read Articles!
December 26, 2022
Bivariate Analysis: Beginners Guide | UNext
November 18, 2022
Everything You Need to Know About Hypothesis Tests: Chi-Square
November 17, 2022
Everything You Need to Know About Hypothesis Tests: Chi-Square, ANOVA
November 15, 2022
Add your details:
By proceeding, you agree to our privacy policy and also agree to receive information from UNext through WhatsApp & other means of communication.
Upgrade your inbox with our curated newletters once every month. We appreciate your support and will make sure to keep your subscription worthwhile