Let’s start with SQLAlchemy, a Python library that allows communication with databases(MySQL, PostgreSQL etc.) and Python. This library is used as an Object Relational Mapper tool that translates Python classes to tables in relational databases and automatically converts function calls to SQL statements that gives application developers the full power and flexibility of SQL in the programming language communication with databases. SQLAlchemy provides an option that allows developers to create flexible ( format of data transmission is irrelevant or not in a format of data transmission) code to communicate with a different type of database engine. As an ORM tool, SQLAlchemy differs from most others in that it is rooted in a complementarity-oriented approach. The entire processes are exposed within a series of composable, transparent tools rather than being hidden behind a wall of automation.
Using the library, developers can remain in control by automating redundant tasks
of how the database is structured and how SQL is constructed.
Bulk Insert A Pandas DataFrame Using SQLAlchemy in Python
In this article, we will look at how to Bulk Insert A Pandas Data Frame Using SQLAlchemy and also a optimized approach for it as doing so directly with Pandas method is very slow.
Creating a connection and database using SQLAlchemy:
So first we will create an engine for connection with the database and further we can use to_sql() command to push the data with its constraints. we have used vs-code for inserting data frame using SQLAlchemy, you can use any other editor as well after installing the library.
Creating a Data frame and inserting it to database using to_sql() function:
Note : “Use below sql command to see above results of sql”
Bulk data Insert Pandas Data Frame Using SQLAlchemy:
We can perform this task by using a method “multi” which perform a batch insert by inserting multiple records at a time in a single INSERT statement. But effectiveness(time) for this method is very poor even compared from a default method.
So to overcome this problem we have another method “callable” with signature(pd_table, conn, keys, data_iter) which is very faster than “multi” method. That can be easily noticed in the below image:
Code implementation:
Python
from sqlalchemy import event import pandas as pd import sqlalchemy import pymysql from sqlalchemy import create_engine "connection name" + ":" + \ "password" + "@" + "localhost" + ":" + "3306" + "/" + "database name" \ + "?" + "charset=utf8mb4" ) # Choose your configuration in the above code. # replace connection name, password ,database name by yours. conn = engine.connect() dict = { # data to insert into data frame 'id' : [ 1 , 2 , 3 ], 'name' : [ "Lazyroar" , "for" , "Lazyroar" ] } # storing data in df variable, hence creating dataframe. df = pd.DataFrame( dict ) print (df) # inserting data into database df.to_sql( "table_gfg" , conn, if_exists = "replace" ) @event .listens_for(engine, "before_cursor_execute" ) def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany): if executemany: cursor.fast_executemany = True df.to_sql( "table_gfg" , conn, index = False , if_exists = "append" ) |
Output: