Cluster Analysis: Scaling

In geometry, all dimensions are equally important. A distance of 2 units on X axis counts the same as a distance of 2 units on Y axis. It does not matter what the unit of measurement is, as long as its the same for both X and Y.

But what if X is measured in yards, Y is measured in centimeters, and Z is measured in nautical miles? A difference of 1 in Z is now equivalent to a difference of 185,200 in Y or 2,025 in X. Clearly, they must all be converted to a common scale before distances will make any sense.

Now take another example. An example that is closer to the reality of business analytics. What if we have 3 variables – Income, Number of cars owned and Age. Clearly these 3 variables are measuring very different things, and thus have very different scales. If we perform cluster analysis on this data, differences in income will most likely dominate the other 2 variables simply because of the scale. In most practical cases, all these different variables need to be converted to one scale in order to perform meaningful analysis.

This video talks about the concept of scaling and various methods of scaling used in analytics.

Related Articles

loader
Please wait while your application is being created.
Request Callback