Statistics with R- Discrete Distributions

23 Mar 2015

In today’s post we will try and understand some discrete distributions namely, Binomial, Poisson and Normal distributions with some examples.

Binomial distribution

A frequency distribution where only two outcomes are possible, such as head or tail, success or failure.
If the probability of success in any given trial is known, binomial distributions can be employed to compute a given number of successes in a given number of trials.

There are four basic functions for the binominal distribution in R. The distribution functions are :
Probability Mass Function: dbinom (k,n,p)
This returns the height of the probability density function. Where k is the number of success, n is the number of trials, and p is the probability of success.

Distribution Function: pbinom (k,n,p)
This returns the cumulative density function. Where n and p are parameters.

Generating Random Variables: rbinom (m,n,p)
This is used to generate random numbers for the given distribution (binominal in this case) and it can also be used for generating a sample. Where m is the number of experiments and n is the number of observations per experiment and p is the probability.

Quantiles: qbinom(q,n,p)
This returns the inverse cumulative density function (quantiles). Where q is the quantile of the Binomial distribution.

Function in R.

Syntax:
myfun<- function(arg1, arg2, …) {
function_body
return()
}

Table of Contents

The value returned by a function is the value of the function body, which is usually an unassigned final expression, e.g.: return()

Example1: Suppose there are twelve multiple choice questions in an English class quiz. Each question has five possible answers, and only one of them is correct. Now if a student attempts to answer every question at random:

i) Find the probability of having four or less correct answers P(x<=4).
ii) Find P(x=3)
iii) Find P(2<x<4)
iv) Generate 3 random numbers
v) Find P(x>=4)
vi) What is the quantile of median?

Solution:

Given the probability of answering a question correctly by random is 1/5=0.2. ie. P=0.2

R codes:
b1=dbinom(3,n,0.2)
b2=pbinom(4,n,0.2)
b3=sum(dbinom(2:4,n,0.2))
b4=rbinom(3,n,0.2)
b5=1-pbinom(3,n,0.2)
b6=qbinom(0.5,n,0.2)

Alternate:

Here we are creating a function Binomial using the function() statement, after entering the codes we have to return the values using return(). We return the results as a vector with c().

Binomial=function(n){
b1=dbinom(3,n,0.2)
b2=pbinom(4,n,0.2)
b3=sum(dbinom(2:4,n,0.2))
b4=rbinom(3,n,0.2)
b5=1-pbinom(3,n,0.2)
b6=qbinom(0.5,n,0.2)

return((c(b1,b2,b3,b4,b5,b6)))
}

Result:
When we enter Binomial(n=12),n=12 as sample size we will get.

> Binomial(12)
[1] 0.2362232 0.9274445 0.6525666 2.0000000 2.0000000 2.0000000 0.2054311 2.0000000

Note: sum(dbinom(2:4,n,0.2)), the sum function will add values of binomial density function from 2 to 4.

Poisson distribution
A distribution that represents the number of events occurring randomly in a particular time interval at an average rate λ.

For the Poisson distribution with parameter lambda, probabilities and cumulative probabilities are given in R by Pr(X = x) = dpois(x,lambda) and Pr(X ≤ x) = ppois(x,lambda)

Quantiles of the Poisson distribution can be found by qpois(p, lambda).
The random deviates can be generated by rpois(p,lambda). Where p is the probability vector.

If an element of x is not integer, the result of dpois is zero.

Note: By default “ppois” will give left tailed test. To find right tailed test we should include “lower.tail=FALSE” as ppois(x,lambda,lower.tail=FALSE).

Examples in Poisson distribution can be done easily.

Note: The probability of success in each trial is same for discrete distributions.

Normal distribution

A function that represents the distribution of many random variables as a symmetrical bell-shaped graph.

To apply Normal distribution: We should have population mean, sample mean and population standard deviation.
Probability Mass Function: dnorm (x,mean,sd)
Returns probability density function. Where x is sample mean or vector, sd is the standard deviation.

Distribution Function: pnorm (x,mean,sd)

Returns the cumulative density function.

Generating Random Variables: rnorm (n,mean,sd)

rnorm is used to generates random deviates. Where n is the number of observations.
If mean or sd are not specified they assume the default values of 0 and 1, respectively.
Note: By default “pnorm” will give left tailed test if we want to find the right tailed test we should include “lower.tail=FALSE” as pnorm(x,mean,sd,lower.tail=FALSE).

Descriptive statistics in R:

Basic statistics like mean, min, max and quartiles can be found using the function summary(). For mode we use mode() and standard deviation can be calculated by sd().

Best Free Resources on R

The Power of R – And Why it’s an Essential Skill for Data Analysts

Interested in learning about other Analytics and Big Data tools and techniques? Click on our course links and explore more.

Jigsaw’s Data Science with SAS Course – click here.

Jigsaw’s Data Science with R Course – click here.

Jigsaw’s Big Data Course – click here.