How To Create/Customize Your Own Scorer Function In Scikit-Learn?

21 July 2024

5

A well-known Python machine learning toolkit called Scikit-learn provides a variety of machine learning tools and methods to assist programmers in creating sophisticated machine learning models. A strong framework for assessing the effectiveness of these models using a variety of metrics and scoring functions is also offered by Scikit-learn. To assess the effectiveness of their models, users might want to design their scoring function in specific circumstances. Scikit-learn makes this possible, and in this article, we’ll go over how to design and tweak your very own scoring function.

A scikit-learn function called a scorer accepts two arguments: the ground truth (actual values) and the model’s predicted values. A single score that evaluates the accuracy of the anticipated values is returned by the function. Accuracy, precision, recall, F1-score, and other predefined scoring functions are available in Scikit-learn. To assess the effectiveness of their models, users might want to develop their unique scoring system.

Custom scorer for a multi-class Regression problem

To create a custom scorer function in sci-kit-learn, we need to follow some steps:

Step 1: Create a custom function that evaluates the accuracy

create a Python function that accepts two arguments: the model’s predicted values and the ground truth (actual values). A single score that evaluates the accuracy of the anticipated values should be returned by the function.

Here I am defining the coefficient of determination (R²)

The coefficient of determination (R²) is a statistical measure that represents how well a statistical model predicts an outcome. It measures the proportion of variance in the predicted output that is explained by the independent input variable(s) in a regression model.

$R^2 = 1- \frac{RSS}{TSS}$

Here,

RSS = Sum of Squared error also known as Residual sum of squares (RSS) measures the variation that is not explained by the regression model. It is the sum of squared differences between the predicted values and the actual target values.

$RSS = \sum(pred-actual)^2$

TSS = total sum of squares (TSS) represents the total variation in the dependent variable. It is the sum of squared differences between the actual values and the mean of the dependent variable

$TSS = \sum (actual-mean)^2$

The value of R² ranges from 0 to 1, with higher values indicating a better fit. A value of 0 indicates that the regression line does not fit the data at all, while a value of 1 indicates a perfect fit.

Python3

import numpy as np
 
def r_squared(y_true, y_pred):
    # Calculate the mean of the true values
    mean_y_true = np.mean(y_true)
 
    # Calculate the sum of squares of residuals and total sum of squares
    ss_res = np.sum((y_true - y_pred) ** 2)
    ss_tot = np.sum((y_true - mean_y_true) ** 2)
 
    # Calculate R²
    r2 = 1 - (ss_res / ss_tot)
 
    return r2

Step 2:Create a scorer object:

Once the scoring function has been constructed, a scorer object must be created using the sci-kit-learn make_scorer() function. The scoring function is passed as an argument to the make_scorer() function, which returns a scorer object.

Python3

from sklearn.metrics import make_scorer
# Create a scorer object using the r_squared function
r2_score = make_scorer(r2_squared)
r2_score

Output:

make_scorer(r2_squared)

Step 3: Implementations of the above-defined scorer object

After creating the scorer object, we can use it to access a machine learning model’s performance using the cross-validation functions for different subsets of datasets provided by scikit-learn or other model assessment tools.

Python3

from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score
 
# Load the California Housing Price dataset
X, y = fetch_california_housing(return_X_y=True)
 
# Create a Random Forest regression model
model = RandomForestRegressor()
 
 
# Evaluate the performance of the model u
# sing cross-validation with the r2_squared function
scores = cross_val_score(model,
                         X, y,
                         cv=5, 
                         scoring=r2_score)
 
# Print the mean and standard deviation of the scores
print(f"R2 Squared: {scores.mean():.2f} +/- {scores.std():.2f}")

Output:

R2 Squared: 0.65 +/- 0.08

Custom scorer for a multi-class classification problem

Steps:

Import the necessary libraries
Load the iris dataset
Define multiple metrics like accuracy_score, precision_score, recall_score, f1_score with make_scorer.
Create a XGBClassifier model
Evaluate the model using cross-validation and the custom scorer
Print the mean scores for each metric

Python

from sklearn.metrics import make_scorer, accuracy_score
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.model_selection import cross_validate
from sklearn.datasets import load_iris
from xgboost import XGBClassifier
 
 
# Load the iris dataset
iris = load_iris()
 
# Define multiple metrics
scoring = {'accuracy': make_scorer(accuracy_score),
           'precision': make_scorer(precision_score, average='macro'),
           'recall': make_scorer(recall_score, average='macro'),
           'f1-score': make_scorer(f1_score, average='macro')
          }
 
# Create a XGBClassifier
clf = XGBClassifier(n_estimators=2, 
                    max_depth=3, 
                    learning_rate=0.1)
 
# Evaluate the model using cross-validation and the custom scorer
scores = cross_validate(clf, iris.data, iris.target, cv=5, scoring=scoring)
 
# Print the mean scores for each metric
print("Accuracy mean score:", scores['test_accuracy'].mean())
print("Precision mean score:", scores['test_precision'].mean())
print("Recall mean score:", scores['test_recall'].mean())
print("f1-score:", scores['test_f1-score'].mean())

Output:

Accuracy mean score: 0.9666666666666668
Precision mean score: 0.9707070707070707
Recall mean score: 0.9666666666666668
f1-score: 0.9664818612187034

How To Create/Customize Your Own Scorer Function In Scikit-Learn?

Custom scorer for a multi-class Regression problem

Step 1: Create a custom function that evaluates the accuracy

Here I am defining the coefficient of determination (R²)

Python3

Step 2:Create a scorer object:

Python3

Step 3: Implementations of the above-defined scorer object

Python3

Custom scorer for a multi-class classification problem

Steps:

Python

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Interview With Alon Jackson – Co-Founder and CEO of Astrix Security by Shauli Zacks

5 Best VPNs for Pakistan in 2025: Avoid Censorship by Eric Goldstein

How to Access Blocked Sites at School in 2025: Full Guide by Penka Hristovska

Interview With Cam Roberson – VP at Beachhead Solutions by Shauli Zacks

Recent Comments

EDITOR PICKS

Interview With Alon Jackson – Co-Founder and CEO of Astrix Security by Shauli Zacks

5 Best VPNs for Pakistan in 2025: Avoid Censorship by Eric Goldstein

How to Access Blocked Sites at School in 2025: Full Guide by Penka Hristovska

POPULAR POSTS

Interview With Alon Jackson – Co-Founder and CEO of Astrix Security by Shauli Zacks

5 Best VPNs for Pakistan in 2025: Avoid Censorship by Eric Goldstein

How to Access Blocked Sites at School in 2025: Full Guide by Penka Hristovska

POPULAR CATEGORY

ABOUT US

FOLLOW US

How To Create/Customize Your Own Scorer Function In Scikit-Learn?

Custom scorer for a multi-class Regression problem

Step 1: Create a custom function that evaluates the accuracy

Here I am defining the coefficient of determination (R2)

Python3

Step 2:Create a scorer object:

Python3

Step 3: Implementations of the above-defined scorer object

Python3

Custom scorer for a multi-class classification problem

Steps:

Python

LEAVE A REPLY Cancel reply

Most Popular

Recent Comments

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY

ABOUT US

FOLLOW US

Here I am defining the coefficient of determination (R²)