In this article let’s learn how to use the make_pipeline method of SKlearn using Python.
The make_pipeline() method is used to Create a Pipeline using the provided estimators. This is a shortcut for the Pipeline constructor identifying the estimators is neither required nor allowed. Instead, their names will automatically be converted to lowercase according to their type. when we want to perform operations step by step on data, we can make a pipeline of all the estimators in sequence.
Syntax: make_pipeline()
parameters:
- stepslist of Estimator objects: The chained scikit-learn estimators are listed below.
- memorystr or object with the joblib.Memory interface, default=None: used to store the pipeline’s installed transformers. No caching is done by default. The path to the cache directory is specified if a string is provided. A copy of the transformers is made before they are fitted when caching is enabled. As a result, it is impossible to directly inspect the transformer instance that the pipeline was given. For a pipeline’s estimators, use the named steps or steps attribute. When fitting takes a while, it is useful to cache the transformers.
- verbosebool, default=False: If True, each step’s completion time will be printed after it has taken its required amount of time.
returns:
p: Pipeline: A pipeline object is returned.
Example: Classification algorithm using make pipeline method
This example starts with importing the necessary packages. ‘diabetes.csv’ file is imported. Feature variables X and y where X variables represent a set of independent features and ‘y’ represents a dependent variable. train_test_split() is used to split X and y variables into train and test sets. test_size is 0.3, which means 30% of data is test data. make_pipeline() method is used to create a pipeline where there’s a standard scaler and logistic regression model. First, the standard scaler gets executed and then the logistic regression model. fit() method is used to fit the data in the pipe and predict() method is used to carry out predictions on the test set. accuracy_score() metric is used to find the accuracy score of the logistic regression model.
To read and download the dataset click here.
Python3
# import packages from sklearn.linear_model import LogisticRegression from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score import numpy as np import pandas as pd # import the csv file df = pd.read_csv( 'diabetes.csv' ) # feature variables X = df.drop( 'Outcome' ,axis = 1 ) y = df[ 'Outcome' ] # splitting data in train and test sets X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3 , random_state = 101 ) # creating a pipe using the make_pipeline method pipe = make_pipeline(StandardScaler(), LogisticRegression()) #fitting data into the model pipe.fit(X_train, y_train) # predicting values y_pred = pipe.predict(X_test) # calculating accuracy score accuracy_score = accuracy_score(y_pred,y_test) print ( 'accuracy score : ' ,accuracy_score) |
Output:
accuracy score : 0.7878787878787878