Descriptive Statistics: An Comprehensive 6 Step Guide

17 Dec 2020

Introduction

Quantitative data analysis of a large collection of data is made possible using certain numerical computations that give an understanding of the nature of the data collected and make it easier to interpret their trend. Descriptive statistics and inferential statistics are the two methods used for this purpose.

In this article let us look at:

What is Descriptive Statistics?
Types of Descriptive Statistics
Methods Used in Descriptive Statistics
Examples of Descriptive Statistics
Important Tools in Descriptive Statistics
Importance of Descriptive Statistics

1. What is Descriptive Statistics?

Descriptive statistics describes or summarizes the basic features or characteristics of the data. It assigns numerical values to describe the trend of the samples collected. It converts large volumes of data and presents it in a simpler, more meaningful format that is easier to understand and interpret. It is paired with graphs and tables; descriptive statistics offer a clear summary of the data’s complete collection.

Descriptive statistics indicate that interpretation is the primary purpose, while inferential statistics make future predictions for a larger set of data based on descriptive values obtained. Hence, descriptive statistics form the first step and the basis of quantitative data analysis.

2. Types of Descriptive Statistics

There are four major types of descriptive statistics used to measure a given set of data characteristics.

A) Measures of Frequency

This measures how often a particular variable occurs in the distribution. It can be measured in numbers or percentages and shows how frequently a response or variable occurs.

B) Measures of Central Tendency

Measures of central tendency indicate the average or the most common variable in the data set. They identify certain points by computing the mean, median, and mode.

C) Measures of Variation or Dispersion

This shows how spread out the responses in the data set are. It helps identify the gap between the highest and lowest values and how far apart individual values are from the mean or the average. Measures of variation are calculated using the range, standard deviation, and variance.

D) Measure of Position

This measures how individual values are positioned with one another. This method of calculation relies on a standardized value. Percentiles and quartile ranks indicate the measures of position.

3. Methods Used in Descriptive Statistics

The various descriptive statistics methods used to arrive at the characteristics of the data set include:

A) Mean

Mean is the average of all the values and can be calculated by adding up all the values and dividing the total sum by the number of values.

Mean = Sum of values/Number of values

B) Median

The median of the set is the value that is at the exact center of the set. If there are two values at the center, their mean is calculated to find the median.

C) Mode

The mode is the value that appears most frequently in the set. Arranging the values in order from lowest to highest helps identify the mode. Any data set can have no mode, one mode, or multiple modes.

D) Range

The range is the difference between the highest value of the data set and the lowest value. It can be calculated by subtracting the lowest value from the highest value. The range indicates how far apart the values are.

E) Standard Deviation

Standard deviation measures the average variability of the values in the data set or how far individual values are from the mean. A large value of the standard deviation indicates high variability. Standard deviation is calculated using six steps:

Calculate the mean of the values
Subtract the mean from individual values to measure the deviation from the mean
Get the squared values of each deviation.
Find the sum of the squared deviations.
Divide the sum of the squared deviation by N-1 (where N is the total number of values)
Find the square root of the number obtained.

F) Variance

Variance measures the degree of spread in the data set and is the average of squared deviations from the mean. A squared standard deviation gives the variance.

These methods can be used for univariate analysis, bivariate analysis, or multivariate analysis as needed.

The univariate analysis considers only one variable at a particular time. This allows the examination of each variable in the data set using different measures of frequency, variation, and central tendency.

The bivariate analysis identifies any available relationship between two different variables. The frequency and variability of the two variables are measured together to see if they vary together. The measure of central tendency can also be taken during bivariate analysis.

Multivariate analysis is similar to bivariate analysis within the exception that it takes more than two variables into account to identify any relationship between them.

4. Examples of Descriptive Statistics

The most important reason for the wide use of descriptive statistics is that it makes a complex set of data easier to interpret by giving a convenient summary. Here are some examples where descriptive statistics help:

It indicates the overall performance of a sportsman in a tournament, such as in baseball. A batting average gives the average number of hits by the batter in the total time at-bat.
A GPA or grade point average indicates the overall performance of a student at school across multiple tests and courses throughout the year.
Identify the distribution of college students using different variables like year of study, gender, course, etc.
Determine the demographics of a certain population in a city, state, or country. Descriptive statistics can identify the distribution of the population in terms of gender or occupation, the variance in income levels, etc.

5. Important Tools in Descriptive Statistics

Various descriptive statistics tools can be called on for specific scenarios. Choosing the right tool depends entirely on the objective of the analysis and the type and number of variables at hand.

There are two categories of tools in descriptive statistics:

Numerical Tools: These include the various methods of calculation:
Mean
Median
Mode
Standard deviation
Variance
Range
Coefficient of variation
Skewness and kurtosis coefficients
Quartiles
Percentiles
Contingency tables
Frequency tables
Correlation
RV coefficient

Graphic Tools: These allow the representation of various data points as graphs or tables:
Box plots
Scatter plots
Whisker plots
Bar chart
Pie chart
Histogram
Ternary diagram
Correlation map
Probability plot
Strip plot

6. Importance of Descriptive Statistics

Descriptive statistics is the basis of any quantitative data analysis process. It gives a simplified picture of the data set, no matter how wide or complex the data, and enables easy interpretation. It is the first step to describing the data and its features. The importance of descriptive statistics lies in its fundamentals as the measures and values obtained through descriptive statistics are essential for any advanced statistical analysis.

Conclusion

Descriptive analytics forms the foundation of quantitative analysis of any set of data. While a single indicator for a large set of data may distort the specifics of the values, it still delivers a convenient and usable summary that indicates the relationship between the variables and allows for essential comparisons.

If you are interested in making it big in the world of data and evolve as a Future Leader, you may consider our Integrated Program in Business Analytics, a 10-month online program, in collaboration with IIM Indore!

Also, Read

Bayesian Statistics: A Comprehensive Guide for Beginners

Descriptive Statistics: An Comprehensive 6 Step Guide

Introduction

1. What is Descriptive Statistics?