Prerequisites: MongoDB Python Basics
This article is about converting the PyMongo Cursor to Pandas Dataframe. Functions like find() and find_one() returns the Cursor instance.
Let’s begin:
- Importing Required Modules: Import the required module using the command:
from pymongo import MongoClient from pandas import DataFrame
If MongoDB is already not installed on your machine you can refer to the guide: Guide to Install MongoDB with Python
If pandas not install you can install it using pip and if you are using Python3 then use pip3 instead of pip to install the required modules.
pip install pandas
- Creating a Connection: Now we had already imported the module, its time to establish a connection to the MongoDB server, presumably which is running on localhost (host name) at port 27017 (port number).
client = MongoClient(‘localhost’, 27017)
- Accessing the Database: Since the connection to the MongoDB server is established. We can now create or use the existing database.
mydatabase = client.name_of_the_database
- Accessing the Collection: We now select the collection from the database using the following syntax:
collection_name = mydatabase.name_of_collection
- Getting the documents: Getting all the documents from the collection using find() method. It returns the instance of the Cursor.
cursor = collection_name.find()
- Converting the Cursor to Dataframe: Converting the Cursor to the Pandas Dataframe.
First, we convert the cursor to the list of dictionary.list_cur = list(cursor)
Now, converting the list to the Dataframe
df = DataFrame(list_cur)
Below is the implementation.
Sample Database:
# Python Program for demonstrating the # PyMongo Cursor to Pandas DataFrame # Importing required modules from pymongo import MongoClient from pandas import DataFrame # Connecting to MongoDB server # client = MongoClient('host_name', # 'port_number') client = MongoClient( 'localhost' , 27017 ) # Connecting to the database named # GFG mydatabase = client.GFG # Accessing the collection named # gfg_collection mycollection = mydatabase.College # Now creating a Cursor instance # using find() function cursor = mycollection.find() print ( 'Type of cursor:' , type (cursor)) # Converting cursor to the list of # dictionaries list_cur = list (cursor) # Converting to the DataFrame df = DataFrame(list_cur) print ( 'Type of df:' , type (df)) # Printing the df to console print () print (df.head()) |
Output:
Type of cursor: <class 'pymongo.cursor.Cursor'> Type of df: <class 'pandas.core.frame.DataFrame'> _id name Roll No Branch 0 1 Vishwash 1001 CSE 1 2 Vishesh 1002 IT 2 3 Shivam 1003 ME 3 4 Yash 1004 ECE 4 5 Raju 1005 CSE
Output Explanation:
As seen above when there is no argument is provided it only prints 5 records (numbered 0 to 4…brainfart).And if you put a positive int in the dataframe function, it will generate that many records.
Please Login to comment…