Fun Fact- Do you know that the first published picture of a regression line illustrating this effect, was from a lecture presented by Sir Francis Galton in 1877. In fact it is said that it is he, who first coined the term linear regression. Galton was a pioneer in the application of statistical methods to measurements in many branches of science. He spent years studying data on relative sizes of parents and their offspring in various species of plants and animals. One of his most famous observations was that: a larger-than-average parent tends to produce a larger-than-average child, but the child is likely to be less large than the parent in terms of its relative position within its own generation.
Well today Linear Regression Models are widely used by Data Scientists everywhere for varied observations. In this blog post I am going to let you into a few quick tips that you can use to improve your linear regression models.
Firstly build simple models. Using many independent variables need not necessarily mean that your model is good. Next step is to try and build many regression models with different combination of variables. Then you can take an ensemble of all these models. This might help you arrive at a good model.
The key step to getting a good model is exploratory data analysis.
Are you sure you really want to make those quantile-quantile plots, influence dia- grams, and all the other things that spew out of a statistical regression package? What are you going to do with all that? Just forget about it and focus on the simple plots that help us understand a model.
Consider transforming every variable in sight:
Apart from transformations, creating new variables out of existing variables is also very helpful. For example, for a retailer, given marketing cost and in-store costs you can create Total cost = marketing cost + in-store costs
The goal is to create models that could make sense (and can then be fit and compared to data) and that include all relevant information.
Don’t get hung up on whether a coefficient “should” vary by group. Just allow it to vary in the model, and then, if the estimated scale of variation is small, maybe you can ignore it if that would be more convenient.
Suggested Reads:
Popular Applications of Linear Regression for Businesses
Regression Modeling
Image courtesy Photobucket
Fill in the details to know more
Important Artificial Intelligence Tools
October 31, 2022
Top 28 Data Analytics Tools For Data Analysts | UNext
September 27, 2022
Stringi Package in R
May 5, 2022
Best Frameworks In Java You Should Know In 2021
May 5, 2021
Lean Management Tools: An Ultimate Overview For 2021
May 4, 2021
Talend ETL: An Interesting Guide In 4 Points
Add your details:
By proceeding, you agree to our privacy policy and also agree to receive information from UNext through WhatsApp & other means of communication.
Upgrade your inbox with our curated newletters once every month. We appreciate your support and will make sure to keep your subscription worthwhile