Before diving into running queries using multiprocessing let’s understand what multiprocessing is in Python. Multiprocessing enables the computer to utilize multiple cores of a CPU to run tasks/processes in parallel. This parallelization leads to significant speedup in tasks that involve a lot of computation. Some of you might be wondering why don’t we use this feature to our greater advantage.
Psycopg2 is the most popular PostgreSQL adapter used in Python. It works on the principle of the whole implementation of Python DB API 2.0 along with thread safety (the same connection is shared by multiple threads).
Running queries on Python using multiprocessing involves two important steps.
- In Order to run queries using Python first establish database connection using Psycopg2.
- After establishing a connection implement a multiprocessing module which helps in completing the task in less time
Used Database:
Establishing Database connection using Python
Python3
import psycopg2 def run(): try : # establishing the connection conn = psycopg2.connect(database = "root" , user = 'root' , password = 'root' , host = '127.0.0.1' , port = '5432' ) # Creating a cursor object using the # cursor() method cursor = conn.cursor() # Executing an query using the execute() method cursor.execute( '''SELECT * FROM root''' ) print ( "Connection established to the database root" ) # Closing the connection conn.close() except : print ( "Connection not established to the database" ) # calling the function run() |
Output:
Connection established to the database root
Running Queries using multiprocessing in Python
Python3
from multiprocessing.connection import Connection import time,os from multiprocessing import Pool, freeze_support import psycopg2 def run(): try : conn = psycopg2.connect(database = "root" , user = 'root' , password = 'root' , host = '127.0.0.1' , port = '5432' ) cursor = conn.cursor() cursor.execute( '''SELECT * FROM root''' ) records = cursor.fetchall() return records except : print ( "Connection not established to the database" ) return - 1 if __name__ = = "__main__" : freeze_support() print ( "Enter the number of times to run the above query" ) n = int ( input ()) results = [] with Pool(processes = os.cpu_count() - 1 ) as pool: for _ in range (n): res = pool.apply_async(run) results.append(res) res = [result.get() for result in results] print (res) pool.close() pool.join() |
Output:
Conclusion
Sometimes you can speedup things by parallelizing them. It’s simple to practice of breaking the problem into small units so that it can be solved much faster. Similarly, these small units are distributed among different workers(processors) in order to solve them. This varies from laptop to PC because it depends on the number of cores mostly laptops have at least 4 and many have 8 whereas PCs have as many as 32. Here comes the problem Python is single-threaded by default it runs one core at a time but with the help of a multiprocessing package we can run as many cores as we want.