For those starting out in analytics, it can be quite confusing to understand the underlying difference between the terms ‘Correlation’ and ‘Dependency’. In statistics, when we talk about dependency, we are referring to any statistical relationship between two random variables or two sets of data. Correlation, on the other hand refers to any of a broad class of statistical relationships involving dependence. Let us further define these two terms:
Dependency: A variable whose value depends on the value assigned to another variable (independent variable).
Correlation: The relationship between two or more variables is considered as correlation. The correlation coefficient always assumes linear relationship regardless of whether that assumption is correct or not.
Example: Let’s consider a unit circle , non linear relation.
We can write the unit circle as
Now we can say that Y is a dependent variable.
Consider the values for the X variable, as the unit circle takes points from.
>x=c(-1,-0.8,-0.6,-0.4,-0.2,0,0.2,0.4,0.6,0.8,1)
Define y
>y=function(x){sqrt(1-x^2)}
The value of dependent variable y on each point of x is
Y=y(x)
[1] 0.0000000 0.6000000 0.8000000 0.9165151 0.9797959 1.0000000 0.9797959 0.9165151 0.8000000 0.6000000 0.0000000
> cor(x,Y)
[1] 0
Despite considering the dependent variables we arrive at the NIL correlation. This means, “A pair of variables which are perfectly dependent on each other, can also give you a zero Correlation.”
When we select negative points for variable:
> x1=c(-1,-0.8,-0.6,-0.4,-0.2)
> y1=y(x1)
> y1
[1] 0.0000000 0.6000000 0.8000000 0.9165151 0.9797959
> cor(x1,y1)
[1] 0.9090862
It gives positive correlation between
When we select non negative points for variable:
> x2=c(0,0.2,0.4,0.6,0.8,1)
> y2=y(x2)
> y2
[1] 1.0000000 0.9797959 0.9165151 0.8000000 0.6000000 0.0000000
> cor(x2,y2)
[1] -0.8789944
It gives negative correlation between
Thus we can conclude by saying that:
Correlation can be used to quantify the linear dependency of two variables. It cannot capture non-linear relationship between variables.
Independent variables has NIL correlation, r=0.
If r=0, indicates NIL correlation but not a non dependency (Independency), they can be dependent.
In other words variables which are perfectly dependent on each other, can also give you a zero Correlation.
If you found this article interesting and want to further understand correlation, take a look at the article Explaining Correlation to a Newbie to Data Analytics.
Image Courtesy: https://www.freedigitalphotos.net/
Suggested Read:
Explaining Correlation to a Newbie to Data Analytics
Why Missing or Incomplete Data is Crucial to the Data Analyst
Fill in the details to know more
Important Artificial Intelligence Tools
October 31, 2022
Top 28 Data Analytics Tools For Data Analysts | UNext
September 27, 2022
Stringi Package in R
May 5, 2022
Best Frameworks In Java You Should Know In 2021
May 5, 2021
Lean Management Tools: An Ultimate Overview For 2021
May 4, 2021
Talend ETL: An Interesting Guide In 4 Points