Prerequisites: L2 and L1 regularization This article aims to implement the L2 and L1 regularization for Linear regression using the Ridge and Lasso modules of the Sklearn library of Python. Dataset – House prices dataset .Step 1: Importing the required libraries
Python3
import
pandas as pd
import
numpy as np
import
matplotlib.pyplot as plt
from
sklearn.linear_model
import
LinearRegression, Ridge, Lasso
from
sklearn.model_selection
import
train_test_split, cross_val_score
from
statistics
import
mean
Step 2: Loading and cleaning the Data
Python3
cd C:\Users\Dev\Desktop\Kaggle\House Prices
data
=
pd.read_csv(
'kc_house_data.csv'
)
dropColumns
=
[
'id'
,
'date'
,
'zipcode'
]
data
=
data.drop(dropColumns, axis
=
1
)
y
=
data[
'price'
]
X
=
data.drop(
'price'
, axis
=
1
)
X_train, X_test, y_train, y_test
=
train_test_split(X, y, test_size
=
0.25
)
Step 3: Building and evaluating the different models a) Linear Regression:
Python3
linearModel
=
LinearRegression()
linearModel.fit(X_train, y_train)
print
(linearModel.score(X_test, y_test))
b) Ridge(L2) Regression:
Python3
cross_val_scores_ridge
=
[]
alpha
=
[]
for
i
in
range
(
1
,
9
):
ridgeModel
=
Ridge(alpha
=
i
*
0.25
)
ridgeModel.fit(X_train, y_train)
scores
=
cross_val_score(ridgeModel, X, y, cv
=
10
)
avg_cross_val_score
=
mean(scores)
*
100
cross_val_scores_ridge.append(avg_cross_val_score)
alpha.append(i
*
0.25
)
for
i
in
range
(
0
,
len
(alpha)):
print
(
str
(alpha[i])
+
' : '
+
str
(cross_val_scores_ridge[i]))
From the above output, we can conclude that the best value of alpha for the data is 2.
Python3
ridgeModelChosen
=
Ridge(alpha
=
2
)
ridgeModelChosen.fit(X_train, y_train)
print
(ridgeModelChosen.score(X_test, y_test))
c) Lasso(L1) Regression:
Python3
cross_val_scores_lasso
=
[]
Lambda
=
[]
for
i
in
range
(
1
,
9
):
lassoModel
=
Lasso(alpha
=
i
*
0.25
, tol
=
0.0925
)
lassoModel.fit(X_train, y_train)
scores
=
cross_val_score(lassoModel, X, y, cv
=
10
)
avg_cross_val_score
=
mean(scores)
*
100
cross_val_scores_lasso.append(avg_cross_val_score)
Lambda.append(i
*
0.25
)
for
i
in
range
(
0
,
len
(alpha)):
print
(
str
(alpha[i])
+
' : '
+
str
(cross_val_scores_lasso[i]))
From the above output, we can conclude that the best value of lambda is 2.
Python3
lassoModelChosen
=
Lasso(alpha
=
2
, tol
=
0.0925
)
lassoModelChosen.fit(X_train, y_train)
print
(lassoModelChosen.score(X_test, y_test))
Step 4: Comparing and Visualizing the results
Python3
models
=
[
'Linear Regression'
,
'Ridge Regression'
,
'Lasso Regression'
]
scores
=
[linearModel.score(X_test, y_test),
ridgeModelChosen.score(X_test, y_test),
lassoModelChosen.score(X_test, y_test)]
mapping
=
{}
mapping[
'Linear Regression'
]
=
linearModel.score(X_test, y_test)
mapping[
'Ridge Regression'
]
=
ridgeModelChosen.score(X_test, y_test)
mapping[
'Lasso Regression'
]
=
lassoModelChosen.score(X_test, y_test)
for
key, val
in
mapping.items():
print
(
str
(key)
+
' : '
+
str
(val))
Python3
plt.bar(models, scores)
plt.xlabel(
'Regression Models'
)
plt.ylabel(
'Score'
)
plt.show()