Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time.
Required Modules:
pip install scipy pip install numpy pip install cupy
Parallel Programming with NumPy
NumPy is a popular numeric computation library for Python known for its efficient array operations and support for vectorized operations. One way to further optimize NumPy code is to use parallel programming techniques, which take advantage of multiple CPU cores to perform calculations faster.
Parallel dot product calculation using NumPy
First, we have to import the Numpy using Import Numpy as np. Then we have to Create two random vectors a and b of length 100000 and calculate the dot product using NumPy’s built-in parallelization, i.e. np.dot(a,b). Finally, Print the result.
Python3
import numpy as np # Create two random vectors of length 100000 a = np.random.rand( 100000 ) b = np.random.rand( 100000 ) # Calculate the dot product using NumPy's built-in parallelization dot_product = np.dot(a, b) print (dot_product) |
Output: 25016.0204799
Parallel matrix multiplication using NumPy and Multiprocessing
First, we have to import the Numpy using Import Numpy as np. Then we imported the multiprocessing using the Import pool from multiprocessing. We have defined matrix multiplication as matrix_multiply(args). we have created two random matrices A and B of size 1000×1000 further we Split the matrices into four parts and created a multiprocessing pool with four workers. Then we have to Map the matrix multiplication function to the four parts of the matrices. Concatenate the parts of the result matrix.
Python3
import numpy as np from multiprocessing import Pool # Define the matrix multiplication function def matrix_multiply(args): A, B = args return np.dot(A, B) # Create two random matrices of size 1000x1000 A = np.random.rand( 1000 , 1000 ) B = np.random.rand( 1000 , 1000 ) # Split the matrices into 4 parts A_parts = np.array_split(A, 4 , axis = 1 ) B_parts = np.array_split(B, 4 ) # Create a multiprocessing pool with 4 workers pool = Pool( 4 ) # Map the matrix multiplication function to the 4 parts of the matrices C_parts = pool. map (matrix_multiply, [(A_part, B_part) for A_part, B_part in zip (A_parts, B_parts)]) # Concatenate the parts of the result matrix C = np.concatenate(C_parts, axis = 1 ) print (C) |
Output:
[[ 246.26109895 245.27979434 247.53272716 ..., 246.54602696 246.56427344 247.98649696] [ 250.3441429 249.08795621 250.72384067 ..., 250.04416057 250.39319075 251.28326167] [ 248.44163838 247.48820248 249.19031327 ..., 248.48692097 249.24465987 250.2703185 ] ..., [ 252.35223132 250.92852728 251.9176228 ..., 251.5751485 253.00980032 252.06391074] [ 251.8001927 249.67594552 250.62393445 ..., 249.82225854 252.16903134 251.53323254] [ 252.24630379 251.09158312 251.64857194 ..., 251.07993262 252.88783961 252.44037699]]
GPU Computing using NumPy
However, NumPy or SciPy alone cannot perform GPU computing. For this, we need some other libraries such as CuPy or PyTorch in addition to NumPy or SciPy to perform GPU computing. At first, we imported the Cupy and Numpy using Import Cupy as cp and Numpy as np. Then we created a random array on GPU, we performed element-wise squaring on the GPU. Then, Transfer the result back to the CPU as a NumPy array. Then we performed further computations on the CPU using NumPy.
Python3
import cupy as cp import numpy as np # Create a random array on the GPU a_gpu = cp.random.rand( 3 , 3 ) # Perform element-wise squaring on the GPU a_squared_gpu = cp.square(a_gpu) # Transfer the result back to the CPU as a NumPy array a_squared_cpu = cp.asnumpy(a_squared_gpu) # Perform further computations on the CPU using NumPy a_sum = np. sum (a_squared_cpu) print ( "Original array on GPU:" ) print (a_gpu) print ( "Squared array on GPU:" ) print (a_squared_gpu) print ( "Squared array on CPU (as NumPy array):" ) print (a_squared_cpu) print ( "Sum of squared array on CPU (computed using NumPy):" ) print (a_sum) |
Output:
Original array on GPU: [[0.86840887 0.46334445 0.07575684] [0.95068822 0.27356767 0.04985629] [0.46676109 0.92671615 0.43278567]] Squared array on GPU: [[7.53992218e-01 2.14645385e-01 5.71954152e-03] [9.04904051e-01 7.48442324e-02 2.48564982e-03] [2.17650008e-01 8.58339857e-01 1.87155546e-01]] Squared array on CPU (as NumPy array): [[7.53992218e-01 2.14645385e-01 5.71954152e-03] [9.04904051e-01 7.48442324e-02 2.48564982e-03] [2.17650008e-01 8.58339857e-01 1.87155546e-01]] Sum of squared array on CPU (computed using NumPy): 3.628021118080383
Multi-threading using NumPy
We need to import the necessary modules – NumPy and ThreadPoolExecutor from concurrent. futures. Next, we’ll define a function func(x) that we want to execute in parallel. We’ll create an input array arr of values that we want to apply this function to. We’ll also define the number of threads we want to use for parallel execution.
Now, we’ll create a ThreadPoolExecutor with the specified number of threads. This executor will allow us to run the function func on the input array arr in parallel. We’ll use the map method of the executor to apply func to each element of arr in parallel. The map method returns an iterator that contains the results of applying func to each element of arr. To get the actual results, we’ll need to convert the iterator to a NumPy array. We’ll do this by calling the list function on the iterator to get a list of the results and then converting that list to a NumPy array.
Python3
import numpy as np from concurrent.futures import ThreadPoolExecutor # Define a function to be executed in parallel def func(x): return x * * 2 # Create an array of values arr = np.arange( 10 ) # Define the number of threads to use num_threads = 4 # Create a ThreadPoolExecutor with the specified number of threads with ThreadPoolExecutor(max_workers = num_threads) as executor: # Use the executor to map the function to the array in parallel results = executor. map (func, arr) # Convert the results to a NumPy array results = np.array( list (results)) # Print the input array and the corresponding results print ( "Input Array: " , arr) print ( "Results: " , results) |
Output:
Finally, we’ll print both the input array arr and the corresponding results array.
Input Array: [0 1 2 3 4 5 6 7 8 9] Results: [ 0 1 4 9 16 25 36 49 64 81]
Parallel Programming with SciPy
SciPy is a popular Python library for scientific and mathematical calculations. It provides many powerful tools for data analysis and signal processing optimization. In such cases, you can use external libraries and tools to run concurrently in SciPy
Parallelizing a simple map or reducing operation using SciPy’s ‘dask‘ module
At first, we imported the Numpy and Dask using Import Numpy as np and Import dask.array as da. Then we created a random array of size 10000 and chunks of size 1000 and assign it to the x. Then we created an operation and assign it to y.
Print the result.
Python3
import numpy as np import dask.array as da x = da.random.normal(size = ( 10000 , 10000 ), chunks = ( 1000 , 1000 )) y = (x + x.T) - x.mean(axis = 0 ) result = y. sum () print (result.compute()) |
Output: -10909.686111782875
Parallelizing a numerical integration using SciPy’s ‘quad‘ function and the ‘multiprocessing’ module
At first, we imported SciPy and multiprocessing using Import integrate form SciPy and Import multiprocessing. We have created a function using F(x) which calculates the square of the number. Then we created the pool worker and assign it to the result.
Python3
from scipy import integrate import multiprocessing def f(x): return x * * 2 pool = multiprocessing.Pool(processes = 4 ) result = integrate.quad(f, 0 , 1 ) print (result[ 0 ]) |
Output: 0.33333333333333337