Parallel Programming with NumPy and SciPy

Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time.

Required Modules:

pip install scipy
pip install numpy
pip install cupy

Parallel Programming with NumPy 

NumPy is a popular numeric computation library for Python known for its efficient array operations and support for vectorized operations. One way to further optimize NumPy code is to use parallel programming techniques, which take advantage of multiple CPU cores to perform calculations faster.

Parallel dot product calculation using NumPy

First, we have to import the Numpy using Import Numpy as np. Then we have to Create two random vectors a and b of length 100000 and calculate the dot product using NumPy’s built-in parallelization, i.e.,b). Finally, Print the result.


import numpy as np
# Create two random vectors of length 100000
a = np.random.rand(100000)
b = np.random.rand(100000)
# Calculate the dot product using NumPy's built-in parallelization
dot_product =, b)

Output: 25016.0204799

Parallel matrix multiplication using NumPy and Multiprocessing

First, we have to import the Numpy using Import Numpy as np. Then we imported the multiprocessing using the Import pool from multiprocessing. We have defined matrix multiplication as matrix_multiply(args). we have created two random matrices A and B of size 1000×1000 further we Split the matrices into four parts and created a multiprocessing pool with four workers. Then we have to Map the matrix multiplication function to the four parts of the matrices. Concatenate the parts of the result matrix.


import numpy as np
from multiprocessing import Pool
# Define the matrix multiplication function
def matrix_multiply(args):
    A, B = args
    return, B)
# Create two random matrices of size 1000x1000
A = np.random.rand(1000, 1000)
B = np.random.rand(1000, 1000)
# Split the matrices into 4 parts
A_parts = np.array_split(A, 4, axis=1)
B_parts = np.array_split(B, 4)
# Create a multiprocessing pool with 4 workers
pool = Pool(4)
# Map the matrix multiplication function to the 4 parts of the matrices
C_parts =, 
      [(A_part, B_part) for A_part, B_part in zip(A_parts, B_parts)])
# Concatenate the parts of the result matrix
C = np.concatenate(C_parts, axis=1)


 [[ 246.26109895  245.27979434  247.53272716 ...,  246.54602696   246.56427344  247.98649696]
         [ 250.3441429   249.08795621  250.72384067 ...,  250.04416057   250.39319075  251.28326167]
         [ 248.44163838  247.48820248  249.19031327 ...,  248.48692097   249.24465987  250.2703185 ]
         [ 252.35223132  250.92852728  251.9176228  ...,  251.5751485   253.00980032  252.06391074]
         [ 251.8001927   249.67594552  250.62393445 ...,  249.82225854   252.16903134  251.53323254]
         [ 252.24630379  251.09158312  251.64857194 ...,  251.07993262   252.88783961  252.44037699]]

GPU Computing using NumPy

However, NumPy or SciPy alone cannot perform GPU computing. For this, we need some other libraries such as CuPy or PyTorch in addition to NumPy or SciPy to perform GPU computing. At first, we imported the Cupy and Numpy using Import Cupy as cp and Numpy as np. Then we created a random array on GPU, we performed element-wise squaring on the GPU. Then, Transfer the result back to the CPU as a NumPy array. Then we performed further computations on the CPU using NumPy.


import cupy as cp
import numpy as np
# Create a random array on the GPU
a_gpu = cp.random.rand(3, 3)
# Perform element-wise squaring on the GPU
a_squared_gpu = cp.square(a_gpu)
# Transfer the result back to the CPU as a NumPy array
a_squared_cpu = cp.asnumpy(a_squared_gpu)
# Perform further computations on the CPU using NumPy
a_sum = np.sum(a_squared_cpu)
print("Original array on GPU:")
print("Squared array on GPU:")
print("Squared array on CPU (as NumPy array):")
print("Sum of squared array on CPU (computed using NumPy):")


 Original array on GPU:
        [[0.86840887 0.46334445 0.07575684]
         [0.95068822 0.27356767 0.04985629]
         [0.46676109 0.92671615 0.43278567]]
        Squared array on GPU:
        [[7.53992218e-01 2.14645385e-01 5.71954152e-03]
         [9.04904051e-01 7.48442324e-02 2.48564982e-03]
         [2.17650008e-01 8.58339857e-01 1.87155546e-01]]
        Squared array on CPU (as NumPy array):
        [[7.53992218e-01 2.14645385e-01 5.71954152e-03]
         [9.04904051e-01 7.48442324e-02 2.48564982e-03]
         [2.17650008e-01 8.58339857e-01 1.87155546e-01]]
        Sum of squared array on CPU (computed using NumPy):

Multi-threading using NumPy

We need to import the necessary modules – NumPy and ThreadPoolExecutor from concurrent. futures. Next, we’ll define a function func(x) that we want to execute in parallel. We’ll create an input array arr of values that we want to apply this function to. We’ll also define the number of threads we want to use for parallel execution.

Now, we’ll create a ThreadPoolExecutor with the specified number of threads. This executor will allow us to run the function func on the input array arr in parallel. We’ll use the map method of the executor to apply func to each element of arr in parallel. The map method returns an iterator that contains the results of applying func to each element of arr. To get the actual results, we’ll need to convert the iterator to a NumPy array. We’ll do this by calling the list function on the iterator to get a list of the results and then converting that list to a NumPy array.


import numpy as np
from concurrent.futures import ThreadPoolExecutor
# Define a function to be executed in parallel
def func(x):
    return x**2
# Create an array of values
arr = np.arange(10)
# Define the number of threads to use
num_threads = 4
# Create a ThreadPoolExecutor with the specified number of threads
with ThreadPoolExecutor(max_workers=num_threads) as executor:
    # Use the executor to map the function to the array in parallel
    results =, arr)
# Convert the results to a NumPy array
results = np.array(list(results))
# Print the input array and the corresponding results
print("Input Array: ", arr)
print("Results: ", results)


Finally, we’ll print both the input array arr and the corresponding results array.

Input Array:  [0 1 2 3 4 5 6 7 8 9]
Results:  [ 0  1  4  9 16 25 36 49 64 81]

Parallel Programming with SciPy 

SciPy is a popular Python library for scientific and mathematical calculations. It provides many powerful tools for data analysis and signal processing optimization. In such cases, you can use external libraries and tools to run concurrently in SciPy

Parallelizing a simple map or reducing operation using SciPy’s ‘dask‘ module

At first, we imported the Numpy and Dask using Import Numpy as np and Import dask.array as da. Then we created a random array of size 10000 and chunks of size 1000 and assign it to the x. Then we created an operation and assign it to y.

Print the result.


import numpy as np
import dask.array as da
x = da.random.normal(size=(10000, 10000), chunks=(1000, 1000))
y = (x + x.T) - x.mean(axis=0)
result = y.sum()

Output: -10909.686111782875 

Parallelizing a numerical integration using SciPy’s ‘quad‘ function and the ‘multiprocessing’ module

At first, we imported SciPy and multiprocessing using Import integrate form SciPy and Import multiprocessing. We have created a function using F(x) which calculates the square of the number. Then we created the pool worker and assign it to the result.


from scipy import integrate
import multiprocessing
def f(x):
    return x**2
pool = multiprocessing.Pool(processes=4)
result = integrate.quad(f, 0, 1)

Output: 0.33333333333333337

