What Is Summary Statistics: Definition and ExamplesÂ

Introduction to Summary Statistics

What are Summary statistics? A statistics summary gives information about the data in a sample. It can help understand the values better. It may include the total number of values, minimum value, and maximum value, along with the mean value and the standard deviation corresponding to a data collection. With this, you can understand the trends, outliers, and distribution of values in a data set. This is especially useful when dealing with large amounts of data as it can help in analyzing the data better. This information can be utilized to steer the rest of the analysis and derive more information about a data set. These are values that are calculated based on the sample data and do not go beyond the data on hand.

What are Summary Statistics?

By definition, the summary statistics sum up the features of a data sample. They describe the values and provide related measurements. These work as a basis for understanding the values recorded during a study.

Descriptive statistics can show where the mean of a set of values lies. It can also help to understand if the data is skewed. Descriptive or summary statistics include:

1. Description of the sample size (usually denoted by N)
2. Description of the center of the data or values (Mean value)
3. Description of how the values are spread
4. Plotted graphs and charts that help understand the distribution of values.

A Few Examples of Summary Statistics

What is the meaning of summary statistics? It can be better understood with the help of the following illustrations:

• Calculation of mean value: Assume a data set with 5 numbers – 20, 30, 40, 50, and 60. The sum of all these numbers is 200. 200 divided by 5 would give the mean value, which is 40.
• Calculation of the Grade Point Average: Many universities consider this score to evaluate students’ performance during the duration of their degree programs. The university records how much a student scores in various courses. More often than not, a course is accompanied by a certain number of credits which is also a numerical value. Letter grades A, B, and C assigned to students correspond to point values such as 4.0, 3.0, and 2.0, respectively. The sum of all these point values a student earns (during a semester, term, or year) is added up and divided by the total number of corresponding credits. The resultant value is the Grade Point Average or the GPA (for that semester, term, or year).
Thus, the GPA pulls together several data points created across grades, courses, and examinations and then calculates the average. This average value helps ” summarize ” a student’s mean academic performance. The final number shows the typical high score corresponding to a student. While this numeric value helps track the student’s progress, it is also useful for comparing the student’s performance with respect to the designated program or varsity standards.
It is important to note that the GPA is a straightforward calculation that is based on the data collected. It does not predict the performance in the future or draw any conclusion. Usually, the summary statistics are presented in the form of a chart or a graph.
• As-Is Report in a Pie Chart: If a 500-member audience in a particular theater were to be asked if they liked a play (yes) or disliked it (no), their responses could be captured in a data set. Also, the summary of their replies, that is, the total number of ‘yes’ and ‘no’ responses, could be represented in a pie chart. This would be another example of summary statistics as it is an as-is report of the findings of the study and does not draw upon any other conclusions.

Every summary statistics example quoted above focuses on one of the important aspects- the mean, the variability, or the data distribution.

Categories Of Summary Statistics

The summary or descriptive statistics can be drilled down into different types, measures, or features. With a focus on averages, the description or summary can be focused on any of the three main categories:  1). the measure of the average value; 2) the frequency of each value; or 3) the spread of the values.

Summary Statistics: Measures of location

Also referred to as central tendency, this summary shows or describes a data set’s center or average. This is measured by the calculated values of the mean, median, and mode.

Mean: This is the most common method of calculating the average value. Usually represented by ‘M.’ The mean can be found by adding the values of the responses and then dividing this sum by the total number of responses (denoted by N). Consider this – a person wants to find out the number of hours they work in one week per day. The data set would include entries of the hours clocked every day of that week – 8, 10, 7, 9, 8, 6, and 4. 52 would be the sum of all these entries, and the total number of responses would be 7. 52 divided by 7 would give the value of M, which is 7.4.

Median: This is defined as the exact central value in the data set. By arranging the values from the lowest to the highest, we get 8 as the median, with 3 values to its left and 3 values to its right.

Mode: This represents the most frequent value in a data set. A given data set may have many modes, including 0 (zero). The mode can be found by arranging the values in a data set in ascending order and then looking for values that are repeated. In the example of work hours per week, by arranging the values from the lowest 4 to the highest 10, we can see that the value 8 is repeating. Thus 8 is the mode.

The measure of spread is also referred to as Dispersion, Variability, or Frequency Distribution. This measure helps us understand how the responses are spread out. The three aspects of spread are range, SD (Standard Deviation), and Variance. Let us examine each of these to understand what the summary statistics meaning is:

Range: This can be used to understand how far the highest and lowest values lie in a data set. This can be found by the subtraction of these two values (i.e., highest – lowest). Considering the earlier example of working hours, the highest entry was 10 and the lowest 4. The range would be 10-4=6.

Standard Deviation: This is an indication of the average variability of the data set. This shows how far each value lies from the value of M, the mean. The higher the value, the more variability. There are several steps to arrive at the SD:

• Tabulate the values along with the mean value
• Subtract the mean from each score and find individual values of the deviation
• Square each of the resultant values
• Find the sum of all the values squared in Step 3
• Divide the sum found in Step 4 by (N-1), where N represents the total number of responses
• Find the square root of the resultant value from Step 5

Summary Statistics: Graphs and Charts

Values of a data set and related observations can be represented graphically in tons of ways. Common graphs and chart types include histograms, Bar charts, Box plots, Frequency Distribution Tables, Scatter Plots, and Pie charts. Each of these comes with its own benefits and can be chosen based on how well it represents the data and how easily a person can understand the meaning of summary statistics via the representation.

Applications Of Summary Statistics

The applications are far and wide and include an assortment of fields and professions – from academics, finance and investments, or even government organizations. Economic interests may lie in data pertaining to consumer spending, inflation, changes in the GDP, and more.

Analysts involved in the Finance domain could be interested in companies and industries, market information with a focus on volumes and prices, consumer sentiment regarding a product or service, and many more variables.

Conclusion

Due to its focus on the collected data, descriptive and summary statistics may seem limited at first glance. However, they aid an analyst in quantifying the data set on hand and help chalk out its basic characteristics. Plus, post-data collection involves no uncertainties; these work well for cleaning up large amounts of data. Along with organized and simplified data, the descriptions or summary statistics thus obtained set the stage for further data analysis.

According to the US Bureau of Labor Statistics, the scope for the Data Science field and related jobs will continue to look up in the coming decade (2021 to 2031). With a 36% job outlook, it is considered a field with much faster growth than many others.

With more organizations making data-driven decisions, the prospect of a role related to statistics and data analytics never seemed brighter than it is now. According to a glassdoor.com report, 2022, a Data Analyst can expect a salary of INR6 lakhs per annum, and for a Data Scientist, this can go upto INR 11 lakhs per annum. Equip yourself with the necessary skills to take on an organization’s data analytics role; explore UNext Jigsaw’s highly recommended Integrated Program in Business Analytics. It comes with a blend of key management skills and real-world scenarios related to Data Science.

You Might Also Like