Massively Speed up Processing using Joblib in Python

20 June 2025

0

In this article, we will see how we can massively reduce the execution time of a large code by parallelly executing codes in Python using the Joblib Module.

Introduction to the Joblib Module

Joblib module in Python is especially used to execute tasks parallelly using Pipelines rather than executing them sequentially one after another. Joblib module lets the user use the full potential of their devices by utilizing all the cores present in their device to make the process as fast as possible. Joblib also lets the user use the cached result from the last time by storing the result in cache memory, in this way the execution speed of any process can be minimized by a lot. Then also we can parallelly run multiple jobs at the same time, although the number of jobs that can be run parallelly is limited to the number of free cores available in the CPU at that time.

Not only the Joblib module can be used to dump and load different results, datasets, models, etc like the Pickle module from anywhere on the device, but we can also simply pass the path alongside the file name to load it or dump it. Joblib also provides a way to compress a huge dataset so that it would be easy to load and manipulate. Different available compression methods in the Joblib module are Zlib and LZ4, while dumping the dataset we need to mention the compression type as a parameter of the dump method of the Joblib module. The files will be stored with an extension of the compression we have used, it is .zlib for Zlib compression and .zl4 for Lz4 compression.

Prerequisite

The user must be familiar with Python, and knowledge about the concept of Multiprocessing is a bonus.

Required Modules

For this tutorial, we will need the joblib module alongside the time module and math module, write the below command to install it.

pip install joblib

time and math module comes pre-installed with Python so no need to install it externally.

Stepwise Implementation:

First, we will import our required classes from the joblib module and the time module.

Python3

import time 
from joblib import Parallel,delayed 
import math 
  
t1 = time.time() 
  
# Normal 
r = [math.factorial(int(math.sqrt(i**3))) for i in range(100,1000)] 
  
t2 = time.time() 
  
print(t2-t1) 

Here we are importing the parallel and delayed classes of joblib module, then firstly we will check how much time that operation normally takes to execute. Here I have tried to first find out the square root of the cube of each number from 100 to 999 then find their factorial, user may try any other operation but make it as much complex as possible for better results.

Output:

Now we will reduce this time as much as possible using the Parallel and delayed functions of difflib module.

Using 2 cores

Using the Parallel function, we will use 2 cores to execute this code and delay the factorial function.

Python3

import time 
from joblib import Parallel,delayed 
import math 
  
t1 = time.time() 
  
# 2 Core 
r1 = Parallel(n_jobs=2)(delayed(math.factorial) (int(math.sqrt(i**3))) for i in range(100,1000)) 
  
t2 = time.time() 
  
print(t2-t1) 

Here the function Parallel takes an argument n_jobs in which we have to pass the number of cores we want to use, or how many pipelines will we use to execute the code parallelly. After that, we are delaying the math.factorial function so that it works parallelly with every pipeline, then we are passing the main operation. Remember to always use the most outside function inside delay, for this example if we use math.sqrt inside delay then this will not give a better result as we are using the result of sqrt to do something else.

Output:

Using 3 Cores

Python3

import time 
from joblib import Parallel,delayed 
import math 
  
t1 = time.time() 
  
# 3 Core 
r1 = Parallel(n_jobs=3)(delayed(math.factorial) (int(math.sqrt(i**3))) for i in range(100,1000)) 
  
t2 = time.time() 
  
print(t2-t1) 

Output:

Using 4 Cores

Python3

import time 
from joblib import Parallel,delayed 
import math 
  
t1 = time.time() 
  
# 4 Core 
r1 = Parallel(n_jobs=4)(delayed(math.factorial) (int(math.sqrt(i**3))) for i in range(100,1000)) 
  
t2 = time.time() 
  
print(t2-t1) 

Output:

Using 5 Cores

Python3

import time 
from joblib import Parallel,delayed 
import math 
  
t1 = time.time() 
  
# 5 Core 
r1 = Parallel(n_jobs=5)(delayed(math.factorial) (int(math.sqrt(i**3))) for i in range(100,1000)) 
  
t2 = time.time() 
  
print(t2-t1) 

Output:

Using 6 Cores

Python3

import time 
from joblib import Parallel,delayed 
import math 
  
t1 = time.time() 
  
# 6 Core 
r1 = Parallel(n_jobs=6)(delayed(math.factorial) (int(math.sqrt(i**3))) for i in range(100,1000)) 
  
t2 = time.time() 
  
print(t2-t1) 

Output:

Using 7 Cores

Python3

import time 
from joblib import Parallel,delayed 
import math 
  
t1 = time.time() 
  
# 7 Core 
r1 = Parallel(n_jobs=7)(delayed(math.factorial) (int(math.sqrt(i**3))) for i in range(100,1000)) 
  
t2 = time.time() 
  
print(t2-t1) 

Output:

Using 8 Cores

Python3

import time 
from joblib import Parallel,delayed 
import math 
  
t1 = time.time() 
  
# 8 Core 
r1 = Parallel(n_jobs=8)(delayed(math.factorial) (int(math.sqrt(i**3))) for i in range(100,1000)) 
  
t2 = time.time() 
  
print(t2-t1) 

Output:

Using all cores

Now if someone doesn’t know how many cores their device has but wants to use the maximum core possible then they will provide -1 as the value of n_jobs parameter.

Python3

import time 
from joblib import Parallel,delayed 
import math 
  
t1 = time.time() 
  
# Max Core 
r1 = Parallel(n_jobs=-1)(delayed(math.factorial) (int(math.sqrt(i**3))) for i in range(100,1000)) 
  
t2 = time.time() 
  
print(t2-t1) 

Output –

1 COMMENT

silver manufacturer thailand 7 December 2025 At 10:16 am

… [Trackback]

[…] Info on that Topic: geeksforgeeks.org/massively-speed-up-processing-using-joblib-in-python/ […]

Log in to leave a comment

Massively Speed up Processing using Joblib in Python

Introduction to the Joblib Module

Prerequisite

Required Modules

Stepwise Implementation:

Python3

Using 2 cores

Python3

Using 3 Cores

Python3

Using 4 Cores

Python3

Using 5 Cores

Python3

Using 6 Cores

Python3

Using 7 Cores

Python3

Using 8 Cores

Python3

Using all cores

Python3

1 COMMENT

LEAVE A REPLY Cancel reply

Most Popular

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY

ABOUT US

FOLLOW US