Hypothesis testing for difference between means: an application in Cricket

11 Jan 2011

Often times, in businesses and outside, people need to evaluate whether the parameters of two populations are alike or different. A pharmaceutical company may want to know whether a new drug causes one reaction in one group and a different reaction in another. A credit-card company may want to know whether the response rate to a mail campaign varies across different test groups. Or, a Cricket team selection committee may want to know if one type of bowlers performs better than the other under certain conditions.

The much awaited Indian squad for the world cup was announced recently and as expected there was controversy around some of the selections. Much of the debate this time centred on selectors’ decision to include three spinners in the team. The selectors justified their decision on the basis of the performance of the bowlers in Indian conditions where the pitches are known to generate turn.

We decided to do a statistical analysis to test the selectors’ claim that spin bowling works better in India.

Performance criteria

We looked at two parameters to measure the performance of the bowlers.

1. Economy rate (ER): runs conceded per over

2. Strike rate (SR): runs conceded per wicket

The first measures the effectiveness of a bowler in containing runs. The second measures the effectiveness of a bowler in taking wickets. The lower the two measures, the better the bowling performance.

Analysis period

The period since the last world cup was chosen as the analysis period. So we included all of India’s home games since the last world cup, a total of 35 games. The data was compiled from the website www.howstat.com, an excellent site for Cricket data. We identified 12 bowlers (8 pacers and 4 spinners) who have played a significant role in this period. We performed the analysis on the bowling figures of these 12 bowlers only.

Analysis

We first looked at the average (mean) of the two performance measures by bowling type (ie. Pace vs. Spin).

Bowling type	Economy rate	Strike rate
Pace	5.96	39.9
Spin	5.05	40.8

At the face of it, the numbers seem to give a split verdict. Spinners have a lower economy rate i.e. they concede fewer runs per over than pacers. However pacers are better wicket takers. They concede fewer runs per wicket.

Now we determine if the differences in means of the two groups (pacers and spinners) are actually statistically significant.

Statistics

We performed a standard two-tailed test for the difference between means, testing our hypothesis at a .05 significance value. We first define our hypothesis.

Null hypothesis: Mean ER (spin) = Mean ER(pace)

Alternate hypothesis: Mean ER (spin) ≠Mean ER (pace)

Level of significance: .05 (We will conclude that the mean economy rate of spinners is indeed different from the pacers if the observed difference between the two means is so large that there is only a 5% chance of the difference being due to chance)

Next we calculated the standard deviations of the two populations.

Bowling type	Economy rate	Standard deviation	Size of population
Pace	5.96	1.63	99
Spin	5.05	1.54	50

Now the estimated standard error of the difference between the means can be determined by

σ_pace-spin= _√(₍σ²_pace/n_pace) + ₍σ²_spin/n_spin)) = .28

Then we standardize the difference of means.

Z = (Mean ER(pace) – Mean ER (spin)) / σ_pace-spin = -3.28

We mark the standardized difference on a sketch of the sampling distribution and compare with the critical values based on our significance level.

Economy Rate Comparison Since the standardized difference lies outside of our acceptance region we reject the null hypothesis of no difference and conclude that the economy rate differs by bowling type.

Now that we know that there is indeed a significant difference in economy rates between pacers and spinners, let us take it one step further. For what values of hypothesized difference will the standardized difference lie within our range (i.e. between -1.96 and +1.96). The answer is between .36 and 1.45.

This implies that there is a 90% probability that a pacer will concede between 3.6 and 14.5, or say, 4 and `15 runs more than a spinner in a ten over spell. There is a 5% probability that the difference will be less than 3.6 and a 5% probability that the difference will be more than 14.5.

Thus, we see, that spinners are definitely more economical than pacers on Indian pitches and the difference is statistically significant.

Now let us examine the strike rates.

Bowling type	Strike rate	Standard deviation	Size of population
Pace	39.9	18.2	99
Spin	40.8	21.3	50

This time the standardized difference i.e. Z= .245.

Strike Rate Comparison The z value lies within our acceptance region and so we do not reject the null hypothesis. We find no statistical difference between strike rates of pacers and spinners.

Conclusion

In conclusion, statistically speaking, spinners are more economical in India, conceding 4 to 16 runs less than pacers in a 10 over spell. In terms of wicket taking abilities, the differences were found to be statistically insignificant.

Statistics in this case, backs up the claims of our national selectors. Spinners are indeed more suited to Indian conditions and are more economical than our pacers.

Interested in learning about other Analytics and Big Data tools and techniques? Click on our course links and explore more.

Jigsaw’s Data Science with SAS Course – click here.

Jigsaw’s Data Science with R Course – click here.

Jigsaw’s Big Data Course – click here.