Linear Regression in Python using Statsmodels

27 July 2024

2

In this article, we will discuss how to use statsmodels using Linear Regression in Python.

Linear regression analysis is a statistical technique for predicting the value of one variable(dependent variable) based on the value of another(independent variable). The dependent variable is the variable that we want to predict or forecast. In simple linear regression, there’s one independent variable used to predict a single dependent variable. In the case of multilinear regression, there’s more than one independent variable. The independent variable is the one you’re using to forecast the value of the other variable. The statsmodels.regression.linear_model.OLS method is used to perform linear regression. Linear equations are of the form:

Syntax: statsmodels.regression.linear_model.OLS(endog, exog=None, missing=’none’, hasconst=None, **kwargs)

Parameters:

endog: array like object.

exog: array like object.

missing: str. None, decrease, and raise are the available alternatives. If the value is ‘none,’ no nan testing is performed. Any observations with nans are dropped if ‘drop’ is selected. An error is raised if ‘raise’ is used. ‘none’ is the default.

hasconst: None or Bool. Indicates whether a user-supplied constant is included in the RHS. If True, k constant is set to 1 and all outcome statistics are calculated as if a constant is present. If False, k constant is set to 0 and no constant is verified.

**kwargs: When using the formula interface, additional arguments are utilised to set model characteristics.

Return: Ordinary least squares are returned.

Installation

pip install numpy
pip install pandas
pip install statsmodels

Stepwise Implementation

Step 1: Import packages.

Importing the required packages is the first step of modeling. The pandas, NumPy, and stats model packages are imported.

import numpy as np
import pandas as pd
import statsmodels.api as sm

Step 2: Loading data.

To access the CSV file click here. The CSV file is read using pandas.read_csv() method. The head or the first five rows of the dataset is returned by using the head() method. Head size and Brain weight are the columns.

Python3

df = pd.read_csv('headbrain1.csv')
df.head()

The head of the data frame looks like this:

Visualizing the data:

By using the matplotlib and seaborn packages, we visualize the data. sns.regplot() function helps us create a regression plot.

Python3

# import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
 
df = pd.read_csv('headbrain1.csv')
sns.regplot('Head Size(cm^3)', 'Brain Weight(grams)', data=df)
 
plt.show()

Output:

Linear Regression in Python using Statsmodels

Step 3: Setting a hypothesis.

Null hypothesis (H0): There is no relationship between head size and brain weight.
Alternative hypothesis (Ha): There is a relationship between head size and brain weight.

Step 4: Fitting the model

statsmodels.regression.linear_model.OLS() method is used to get ordinary least squares, and fit() method is used to fit the data in it. The ols method takes in the data and performs linear regression. we provide the dependent and independent columns in this format :

inpendent_columns ~ dependent_column:

left side of the ~ operator contains the independent variables and right side of the operator contains the name of the dependent variable or the predicted column.

Python3

df.columns = ['Head_size', 'Brain_weight']
model = smf.ols(formula='Head_size ~ Brain_weight', data=df).fit()

Step 5: Summary of the model.

All the summary statistics of the linear regression model are returned by the model.summary() method. The p-value and many other values/statistics are known by this method. Predictions about the data are found by the model.summary() method.

Python3

print(model.summary())

Code Implementation:

Python3

# import packages
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
 
# loading the csv file
df = pd.read_csv('headbrain1.csv')
print(df.head())
 
# fitting the model
df.columns = ['Head_size', 'Brain_weight']
model = smf.ols(formula='Head_size ~ Brain_weight', data=df).fit()
 
# model summary
print(model.summary())

Output:

Description of some of the terms in the table :

R- squared value: R-squared value ranges between 0 and 1. An R-squared of 100 percent indicates that all changes in the dependent variable are completely explained by changes in the independent variable(s). if we get 1 as an r-squared value it means there’s a perfect fit. In our example, the r-squared value is 0.638.
F- statistic: The F statistic simply compares the combined effect of all variables. In simplest terms, reject the null hypothesis if your alpha level is greater than your p-value.
coef: the coefficients of the independent variables in the regression equation.

Our predictions:

If we take our significance level (alpha) to be 0.05, we reject the null hypothesis and accept the alternative hypothesis as p<0.05. so, we can say that there is a relationship between head size and brain weight.

Linear Regression in Python using Statsmodels

Installation

Stepwise Implementation

Python3

Python3

Python3

Python3

Code Implementation:

Python3

Description of some of the terms in the table :

Our predictions:

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

OnePlus’ decision to ditch Samsung’s OLED screens could backfire in the US

Recent Comments

EDITOR PICKS

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

POPULAR POSTS

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

POPULAR CATEGORY

ABOUT US

FOLLOW US