Let’s load goodies and create theme

We will use mtcars data to demonstrate the power of Ridge regression for model selection. First, we need to prep our data and see if there is any missing value.

Good, there is no missing value. We will use the glmnet()  function to perform Ridge Regression. Unlike lm() , glmnet()  cannot take data frame. Therefore, we need two additional steps. First, we need to create a matrix.

Next, we separate the independent variable out.

Recall that Ridge Regression has the penalty parameter as follows:


As we need to specify the value of \(\lambda\), we can either get the value by manually specifying the value or applying other cross-validation technique such as k-Folds or LOOCV. But before the steps, we need to divide the data into a training set and a test set.

Choosing \(\underline\lambda\) Manually

As glmnet()  cannot take a data frame, we will use an index to subset data. Then we fit the Ridge Regression model.

By default, the function will generate 100 values of \(\lambda\). We can create a loop to calculate MSE of each value.

Then we visualize.

Now we can see that the lowest MSE is somewhere between 75 and 78. Dplyr can give us a better view.

We can get coefficients through coef() .

But we need to manually calculate L2 norm.

Choosing \(\underline\lambda\) Through k-CV 

Glmnet library has cv.glmnet()  function that can automatically perform a k-CV.

The \(\lambda\) with lowest associated MSE is stored in ‘lambda.min’ that we can access through subsetting.

We then use the \(\lambda\) in the predict() .

Two methods give slightly different lambda as the first method trained on the entire dataset, while the second method is k-CV. Now, is MSE value of 6 considered low? Let’s compare with plain lm() .

Yep, MSE value of 6 is indeed low and can be considered an improvement.