Save classifier to disk in scikit-learn in Python

22 July 2024

1

In this article, we will cover saving a Save classifier to disk in scikit-learn using Python.

We always train our models whether they are classifiers, regressors, etc. with the scikit learn library which require a considerable time to train. So we can save our trained models and then retrieve them when required. This saves us a lot of time. Serialization is the process of saving data, whereas Deserialization is the process of restoring it, we will learn to save the classifier models in two ways:

Method 1: Using Pickle

Pickle is a library provided by Python and is the standard way of saving and retrieving files from storage. It first serializes the object model and then saves it to the disk. Later we retrieve it using deserializing. Pickling is a process where a Python object hierarchy is converted into a byte stream. Unpickling is the inverse of the Pickling process where a byte stream is converted into an object hierarchy.

dumps() – This function is called to serialize an object hierarchy.
loads() – This function is called to de-serialize a data stream.

Syntax:

# Saving model

import pickle

pickle.dump(model, open(“model_clf_pickle”, ‘wb’))

# load retrieve

my_model_clf = pickle.load(open(“model_clf_pickle”, ‘rb’))

Example:

We have the iris dataset on which we trained the K Nearest Neighbor classifier. Then we saved the model using the pickle and later retrieved using the pickle and calculate the score of the classifier.

Python3

from sklearn.datasets import load_iris 
from sklearn.model_selection import train_test_split 
from sklearn.neighbors import KNeighborsClassifier 
import pickle 
  
# load the iris dataset as an example 
iris = load_iris() 
  
# store the feature matrix (X) and response vector (y) 
X = iris.data 
y = iris.target 
  
# splitting X and y into training and testing sets 
X_train, X_test, y_train, y_test = train_test_split( 
  X, y, test_size=0.4, random_state=1) 
  
# training the model on training set 
model_clf = KNeighborsClassifier(n_neighbors=3) 
model_clf.fit(X_train, y_train) 
  
# Saving classifier using pickle 
pickle.dump(model_clf, open("model_clf_pickle", 'wb')) 
  
# load classifier using pickle 
my_model_clf = pickle.load(open("model_clf_pickle", 'rb')) 
result_score = my_model_clf.score(X_test,y_test) 
print("Score: ",result_score)

Output

Score:  0.9833333333333333

Method 2: Using the joblib library

Joblib is the replacement of a pickle as it is more efficient on objects that carry large NumPy arrays. This is solely created for the purpose of saving the models and retrieving them when required These functions also accept file-like objects instead of filenames.

joblib.dump is used to serialize an object hierarchy
joblib.load is used to deserialize a data stream

Syntax:

# Save model
joblib.dump(model,"model_name.pkl")

# Retrieve model
joblib.load("model_name.pkl")

Example:

We have the iris dataset on which we trained the K Nearest Neighbor classifier. Then we saved the model using joblib and later retrieved using the joblib. Finally, we calculate the score of the classifier.

Python3

from sklearn.datasets import load_iris 
from sklearn.model_selection import train_test_split 
from sklearn.neighbors import KNeighborsClassifier 
import joblib 
  
# load the iris dataset as an example 
iris = load_iris() 
  
# store the feature matrix (X) and response vector 
# (y) 
X = iris.data 
y = iris.target 
  
# splitting X and y into training and testing sets 
X_train, X_test, y_train, y_test = train_test_split( 
    X, y, test_size=0.4, random_state=1) 
  
# training the model on training set 
model_clf = KNeighborsClassifier(n_neighbors=3) 
model_clf.fit(X_train, y_train) 
  
# Saving classifier using joblib 
joblib.dump(model_clf, 'model_clf.pkl') 
  
# load classifier using joblib 
my_model_clf = joblib.load("model_clf.pkl") 
result_score = my_model_clf.score(X_test, y_test) 
print("Score: ", result_score) 

Output:

Score:  0.9833333333333333

Save classifier to disk in scikit-learn in Python

Method 1: Using Pickle

Python3

Method 2: Using the joblib library

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

Google Messages can now show your profile exactly how it’s supposed to be

Recent Comments

EDITOR PICKS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR POSTS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR CATEGORY

ABOUT US

FOLLOW US