Introduction
Following the global spread of the internet, we are generating data at an unprecedented pace now. Because performing any kind of analysis would require us to collect/query the necessary data from the database, it becomes of utmost importance that we choose the right tool to query the data. Consequently, we cannot imagine ourselves using SQL to work with this volume of data, as every single query will be expensive. This is precisely where MongoDB comes in. MongoDB is an unstructured database that stores data in the form of documents. This means that you do not have to provide a schema to store data in the MongoDB database. This is in contrast to relational databases, where you have to provide the schema firsthand before you insert any data. Additionally, MongoDB is able to handle huge volumes of data very efficiently using PyMongo. This makes it the most widely used NoSQL database as it offers rich query language and flexible and fast access to data.
In this article, we will see multiple examples of how to query a MongoDB database using PyMongo. Further, we will understand the basics of aggregation pipelines and learn to use comparison operators, logical operators, and regular expressions.
Learning Objectives
- Learn to set up a database in MongoDB using PyMongo.
- Learn basic querying of MongoDB database, like filtering with different operators – logical, comparison, regular expressions, etc.
This article is in continuation of the MongoDB in Python Tutorial for Beginners, where we covered the challenges of unstructured databases, installation steps, and basic operations of MongoDB. So if you are a complete beginner in MongoDB, I would recommend you go through that article first.
Table of Contents
- What Is PyMongo?
- How to Install PyMongo?
- How to Create Collections and Load Data With PyMongo?
- How to Query MongoDB Using PyMongo?
- Aggregation Pipelines
- Conclusion
- Frequently Asked Questions
What Is PyMongo?
PyMongo is a Python library that enables us to connect with MongoDB. Moreover, this is the most recommended way to work with MongoDB in Python programming language.
Also, we have chosen Python to interact with MongoDB because it is one of the most commonly used and considerably powerful languages for data science and machine learning. PyMongo allows us to retrieve the data with dictionary-like syntax. This is because, internally, MongoDB is storing all the documents in BSON format, which is the Binary form of JSON (JavaScript Object Notation). Now, reading these documents in different programming languages would convert them into the internally supported data types in that programming language. Therefore, in Python, these BSON documents are retrieved as Python dictionaries.
In case you are a beginner in Python, I will recommend you to enroll in this free course: Introduction to Python.
How to Install PyMongo?
Installing PyMongo is simple and straightforward. Here, I am assuming you already have Python 3 and MongoDB installed. The following command will help you to install PyMongo:
pip3 install pymongo
How to Create Collections and Load Data With PyMongo?
Now let us set things up before querying a MongoDB database using PyMongo. Firstly we will insert the data into the database. The following steps will help you with this-
Step 1: Import the libraries and connect to the mongo client.
Start the MongoDB server on your machine. I am assuming it is running at localhost:27017.
Let’s start by importing some of the libraries that we are going to use. By default, the MongoDB server runs at port 27017 on the local machine. Then, we will connect to the MongoDB client using the PyMongo library. The code for it is as follows:
client = pymongo.MongoClient('mongodb://localhost:27017')
It provides a client for a running MongoDB instance.
Step 2: Get the database instance of the database sample_db.
In case it doesn’t exist, MongoDB will create one for you. The code for it is as follows:
database = client['sample_db']
Python Code:
Note – You can view all the existing database names using MongoClient().list_database_names()
Step 3: Create the Collections from the JSON files.
We will use the data from a meal delivery company that operates in multiple cities. Additionally, they have various fulfillment centers in these cities for dispatching meal orders to their customers. You can download the data and the code here.
1. weekly_demand:
- id: Unique ID for each document
- week: Week Number
- center_id: Unique ID for fulfillment center
- meal_id: Unique ID for Meal
- checkout_price: Final price including discount, taxes & delivery charges
- base_price: Base price of the meal
- emailer_for_promotion: Emailer sent for promotion of meal
- homepage_featured: Meal featured on the homepage
- num_orders: (Target) Orders Count
2. meal_info:
- meal_id: Unique ID for the meal
- category: Type of meal (beverages/snacks/soups….)
- cuisine: Meal cuisine (Indian/Italian/…)
Then we will create two collections in the sample_db database. For this we will use the create_collection() function as follows:
If you want to view a list of all the collections in your database, you can use the list_collection_names() function.
Step 4: Insert data into Collections.
Now, the data we have is in JSON format. Then we will get the instance of the collection, read the data file, and insert the data using the insert_many function.
Finally, we have 456548 documents in the weekly_demand_collection and 51 documents in the meal info collection. Now, let’s have a look at one document from each of these collections.
Note: insert_one() is s CRUD function in PyMongo to insert documents in a collection.
Let’s have a look at a document from the first MongoDB collection using the find_one() function. It returns a single document from the collection.
weekly_demand_collection
Notice that by default the _id field of type ObjectId().
meal_info_collection
Now, our data is ready. Let’s move on to query this database.
Let’s understand this better.
How to Query MongoDB Using PyMongo?
We can query a MongoDB database using PyMongo with the find function to get all the results satisfying the given condition and also using the find_one function, which will return only one result satisfying the condition.
The following is the syntax of the find and find_one:
your_collection.find( {<< query >>} , { << fields>>} )
The find_one() takes two parameters. The first is the query parameter based on which the documents will be fetched from the collection. The second parameter is for the fields we want to output from these retrieved documents.
You can query the database using the following filtering techniques:
Filter based on fields
For instance, you have hundreds of fields and you want to see only a few of them. You can do that by just putting all the required field names with the value 1. For example:
On the other hand, if you want to discard a few fields only from the complete document, you can put the field names equal to 0. Therefore, only those fields will be excluded. Please note that you cannot use a combination of 1s and 0s to get the fields. Either all should be one, or all should be zero.
Filter with a condition
Now, in this section, we will provide a condition in the first braces and fields to discard in the second. Consequently, it will return the first document with center_id equal to 55 and meal_id equal to 1885 and will also discard the fields _id and week.
Filter based on Comparison Operators
The following are the nine comparison operators in MongoDB.
NAME | DESCRIPTION |
---|---|
$eq |
It will match the values that are equal to a specified value. |
$gt |
It will match the values that are greater than a specified value. |
$gte |
It will match all the values that are greater than or equal to a specified value. |
$in |
It will match any of the values specified in an array. |
$lt |
It will match all the values that are less than a specified value. |
$lte |
It will match all the values that are less than or equal to a specified value. |
$ne |
It will match all the values that are not equal to a specified value. |
$nin |
It will match none of the values specified in an array. |
The following are some examples of using these comparison operators-
1. Equal to and Not Equal to
We will find all the documents where center_id is equal to 55 and homepage_featured is not equal to 0. Since we are going to use the find function, it will return the cursor for that command. Further, use a for loop to iterate through the results of the query.
2. In the List and Not in the List
For instance, you need to match an element with multiple elements. In that case, instead of using the $eq operator multiple times, we can use the $in operator. We will try to find out all the documents where center_id is either 24 or 11.
Then we find all the documents where center_id is not present in the specified list. The following query will return all the documents where center_id is not 24 and also not 11.
3. Less than and Greater than
Now, let us find all the documents where center_id is 55 and checkout_price is greater than 100 and less than 200. Use the following syntax for this-
Filter based on Logical Operator
NAME | DESCRIPTION |
---|---|
$and |
It will join query clauses with a logical AND and returns all documents that match both the conditions. |
$not |
It will invert the effect of a query and returns documents that do not match the query expression. |
$nor |
It will join the query clauses with a logical NOR and return all documents that fail to match the clauses. |
$or |
It will join the query clauses with a logical OR and return all documents that match the conditions of either clause. |
The following examples illustrate the use of logical operators-
- AND Operator
The following query will return all the documents where the center_id is equal to 11 and also the meal_id is not equal to 1778. The subqueries for the and operator will come inside a list.
- OR Operator
The following query will return all the documents where either the center_id is equal to 11 or the meal_id is either 1207 or 2707. Further, the subqueries for the or operator will come inside a list.
Filter with Regular Expressions
Regular Expressions are of great use when you have text fields, and you want to search for documents with a specific pattern. In case you want to learn more about regular expressions, I highly recommend you go through this article: Beginners Tutorial for Regular Expressions in Python.
It can be used with the operator $regex, and we can provide value to the operator for the regex pattern to match. We will use the meal_info collection for this query, and then we will find out the documents where the cuisine field starts with character C.
Let’s take another example of regular expressions. We will find out all the documents in which category starts from the character “S”, and the cuisine ends with “ian“.
Aggregation Pipelines
MongoDB’s aggregation pipeline provides a framework to perform a series of data transformations on a dataset. The following is its syntax:
your_collection.aggregate( [ { <stage1> }, { <stage2> },.. ] )
The first stage takes the complete set of documents as input, and from there, each subsequent stage takes the previous transformation’s result set as input to the next stage and produces the output.
There are around 10 transformations available in the MongoDB aggregate out of which we will see $match and $group in this article. We will discuss each of the transformations in detail in the upcoming article on MongoDB.
For example, in the first stage, we will match the documents where center_id is equal to 11, and in the next stage, it will count the number of documents with center_id equal to 11. Note that we have assigned the $count operator the value equalling total_rows in the second stage, which is the name of the field that we want in the output.
Now, let’s take another example where the first stage is the same as before, i.e., center_id is equal to 11, and in the second stage, we want to calculate the average of the field num_orders for the center_id 11 and the unique meal_ids for the center_id 11.
Conclusion
The unfathomable amount of data generated today makes it necessary to find better alternatives like this to query data. To summarize, in this article, we learned how to query a MongoDB database using PyMongo. Also, we understood how to apply various filters as per the situation required.
Key Takeaways
- Using create_collection(“<collection_name>”), you can create a collection in MongoDB
- You can use the insert_many() function to insert multiple records in a collection.
- The find_one() or find() functions can be used in conjunction with operators like $and, $or, $in, etc., which can be used to filter the documents in a MongoDB collection
- The Aggregation pipeline provides a framework to perform a series of transformations on the documents in a collection like $match and $group.
Frequently Asked Questions
A. You can connect to PyMongo in the following manner –import pymongo client = pymongo.Mongo
client('mongodb://localhost:27017')
A. You can retrieve documents from MongoDB collection using the find() or find_one() functions. In both of these functions, you can provide the filter query and the specific fields you want to retrieve from these documents.
A. You can insert data in MongoDB collection using the insert() or insert_many() functions.