Ridge Regression: An Interesting Overview In 2021


When one analyses data with multicollinearity, the technique of ‘Ridge regression’ is used for model tuning using ridge and lasso regression or L2 regularization. It is useful since whenever multicollinearity happens in the data, the data exhibits large variances, and the least-squares will be unbiased. Thus the predicted values and actual values will have large variances.

The ridge regression cost function denoted below Lambda is the ridge function penalty term denoted by the alpha parameter.

Min(||Y – X(theta)||^2 λ||theta||^2)

If values of alpha get bigger, the penalty is larger, and the coefficient’s magnitude is smaller. Thus it can prevent multicollinearity by parameter-shrinking, and further model complexity is reduced due to the shrinkage of the coefficient.

In this article let us look at:

  1. Ridge Regression Models
  2. Standardization
  3. Assumptions of Ridge Regressions
  4. Linear Regression Model
  5. Regularization

1. Ridge Regression Models

For machine learning models, the ridge regression formula is given by

Y = XB e

Here, Y- dependent variable, X- independent variables, e- residual errors and B- regression coefficients in the ridge regression derivation. When the lambda function is also considered and identified L2 regularization data is ready, one can undertake standardization.

2. Standardization

Ridge regression uses standardized variables. Hence to standardize the independent and dependent variables, their mean value needs to be subtracted and divided by the standard deviation. However, one will need to notate whether the variables have been standardized and ensure that the final values of displayed regression coefficients are in the original scale. Thus, the ridge trace is always a standardized scale.

Bias and variance trade-off:

Actual dataset ridge regression building makes a trade-off between variance and bias, which follows the trends in the λ function mentioned below.

  • If λ increases, then bias also increases.
  • If λ increases, then the variance decreases.

3. Assumptions of Ridge Regressions

The ridge and linear regression models both follow variables of constant variance, independence and linearity. But ridge regression multicollinearity assumes the error distributions and does not give the confidence limits in a ridge vs lasso regression.

4. Linear Regression Model

When to use ridge regression? Consider the below problem in linear ridge regression example to understand how ridge regression, when implemented, reduces errors.

The data is of food restaurants in a particular region where the best food item combination for increased sales is evaluated.

The first step is to upload the libraries required. This is done by importing numpy (np), pandas (pd), the OS (os), seaborn (sns), linear regression from the sklearn. linear_model, matplotlib.pyplot (plt) with the classic (plt.style.use), warnings with warnings.filterwarnings with ‘ignore’ and df=pd.read_excel (“food.xlsx”).

Once all missing values have attributes and data EDA is complete, dummy-variables are created. Note, the dataset should not contain categorical variables. Hence, if columns=cat is used to show the data set’s categorical variables, we have

df is equaal to pd.get_dummies(columns=cat,df, drop_first=True)

This is then standardized and used in the Linear Regression method as the data set.

The next step is scaling variables since continuous variables have weights that differ. This process returns all attributes z-scores in the # scale data. Start with

from sklearn.preprocessing import StandardScaler using std_scale = StandardScaler() with std_scale. Also ensure

df[‘final_price’] = std_scale.fit_transform(df[[‘final_price’]])

df[‘week’] = std_scale.fit_transform(df[[‘week’]])

df[‘area_range’] = std_scale.fit_transform(df[[‘area_range’]])

The third step is to execute a Train-Test Split accomplished by the operations below.

# Copy predictor variables into dataframe X where X is the df.drop(‘orders’, axis=1) 

# Copy target into dataframe y. The Target variable Target is now converted in to Log values and given by y = np.log(df[[‘orders’]])

Now, # Split y and X into training/test in a 75:25 ratio using the import

import from sklearn.model_selection, train_test_split   where X_test, X_train, y_test, y_train, = train_test_split(X, y, random_state=1, test_size=0.25).

The final step is applying the Linear Regression Model.

# invoke the LinearRegression function and find the bestfit model on training data where the regression_model = LinearRegression()and regression_model.fit(X_train, y_train)

# To explore each independent attribute’s coefficients, we use the operation below. 

for col_name, idx in enumerate(X_train.columns):

print(“The coefficient for {} is {}”.format(col_name, regression_model.coef_[0][idx]))

The coefficients can be represented as 

final_price -0.40354286519747384

week -0.0041068045722690814

area_range 0.16906454326841025

website_homepage_mention_1.0 0.44689072858872664

food_category_Desert 0.5722054451619581

food_category_Biryani -0.10369818094671146

food_category_Extras -0.22769824296095417

food_category_Other Snacks -0.44682163212660775

food_category_Pasta is -0.7352610382529601

food_category_Rice Bowl 1.640603292571774

food_category_Pizza 0.499963614474803

food_category_Salad 0.22723622749570868

food_category_Seafood -0.07845778484039663

food_category_Starters -0.3782239478810047

food_category_Sandwich 0.3733070983152591

food_category_Soup -1.0586633401722432

cuisine_Italian -0.03927567006223066

cuisine_Indian -1.1335822602848094

center_type_Noida 0.0501474731039986

center_type_Gurgaon -0.16528108967295807

night_service_1 0.0038398863634691582

home_delivery_1.0 1.026400462237632

Now, to checking the magnitude of coefficients use pandas import Series, DataFrame predictors = X_train.columns 


coef = Series(regression_model.coef_.flatten(), predictors).sort_values()

plt.figure(figsize=(10,8)) and coef.plot(kind=’bar’, title=’Model Coefficients’)


From the diagram the variables with “positive” values like area_range, food_category_Salad, food_category_Desert,food_category_Pizza , food_category_Rice Bowl, home_delivery_1.0, website_homepage_mention_1.0, food_category_Sandwich, are the factors that influence the ridge regression model most.

Noting that in the ridge regression equation, the higher impact is found when the beta coefficient is higher, dishes like Pizza, Rice Bowl, Desert using website_homepage_mention and home delivery play out as important factors in the number of orders or demand with high frequency. The regression model’s negative variables predict restaurant orders in food category_Pasta, cuisine_Indian,food_category_Soup, and food_category_Other_Snacks.

The Final_price is seen to hurt the order of ridge regression. Dishes like Pasta, Soup, other_snacks, Indian food categories also hurt the restaurant’s number of orders and model prediction when all predictors considered are kept constant. The model also has variables like night_service and week, which have no appreciable impact on the order frequency in model prediction. Thus one concludes that the continuous variables are less significant when compared to the categorical variables or object types of variables.

5. Regularization

Regularization is the process of ridge regression regularization where the hyperparameter of Ridge or alpha values are manually set (as they are not learned automatically by the ridge regression algorithm), by running a grid search for optimum values of alpha for Ridge Regularization executed in GridSearchCV by importing as from sklearn.model_selection import GridSearchCV, from sklearn.linear_model import Ridge where ridge=Ridge(),ridge_regressor.fit(X,y),ridge_regressor=GridSearchCV(ridge,parameters,scoring=’neg_mean_squared_error’,cv=5)and parameters={‘alpha’:[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,20,30,35,40,45,50,55,100]} 

Now print(ridge_regressor.best_score_)and print(ridge_regressor.best_params_) for {‘alpha’: 0.01} which is -0.3751867421112124.The value’s sign is negative due to Grid Search Cross Validation library error and can be ignored. 

coef = Series(ridgeReg.coef_.flatten(),predictors).sort_values()

predictors = X_train.columns 

coef.plot(kind=’bar’, title=’Model Coefficients’)



Now, the final ridge regression model predicts the equation.

Orders = 4.65 1.02home_delivery_1.0 .46 website_homepage_mention_1 0 (-.40* final_price) .17area_range 0.57food_category_Desert (-0.22food_category_Extras) (-0.73food_category_Pasta) 0.49food_category_Pizza 1.6food_category_Rice_Bowl 0.22food_category_Salad 0.37food_category_Sandwich (-1.05food_category_Soup) (-0.37food_category_Starters) (-1.13cuisine_Indian) (-0.16center_type_Gurgaon)

Here the top 5 influencing variables of the ridge regression model are:

  1. home_delivery_1.0
  2. food_category_Rice Bowl
  3. food_category_Desert
  4. food_category_Pizza
  5. website_homepage_mention_1


The why ridge regression question is answered by ridge regression solution where the beta coefficient, when higher, makes the predictor more significant. This model, when tuned, can help find the business problem’s best ridge regression variables through ridge regression analysis.

There are no right or wrong ways of learning AI and ML technologies – the more, the better! These valuable resources can be the starting point for your journey on how to learn Artificial Intelligence and Machine Learning. Do pursuing AI and ML interest you? If you want to step into the world of emerging tech, you can accelerate your career with this Machine Learning And AI Courses by Jigsaw Academy.


Related Articles

Please wait while your application is being created.
Request Callback