Query mongodb database with pymongo!

Time:2021-7-26

Author lakhay Arora
Compile Flin
Source | analyticsvidhya

introduce

With the popularity of the Internet, we are now generating data at an unprecedented speed. Because performing any type of analysis requires us to collect / query the necessary data from the database, it is very important to select the right tool to query the data. Therefore, we can’t imagine using SQL to process such a large amount of data, because the cost of each query is very high.

This is where mongodb comes into play. Mongodb is an unstructured database that stores data in the form of documents. In addition, mongodb can process a large amount of data very efficiently and is the most widely used NoSQL database because it provides rich query language and flexible and fast access to data.

In this article, we will see several examples of how to query the mongodb database using pymongo. In addition, we’ll see the basics of how to use comparison and logical operators, regular expressions, and aggregation pipelines.

This article is a beginner’s tutorial for mongodb(https://www.analyticsvidhya.com/blog/2020/02/mongodb-in-python-tutorial-for-beginners-using-pymongo)In which we discussed the challenges of unstructured database, installation steps and basic operation of mongodb. Therefore, if you are a beginner of mongodb, I suggest you read this article first.

catalogue

  1. What is pymongo?

  2. Installation steps

  3. Insert data into database

  4. query data base

    1. Filter by field
    2. Filter by comparison operator
    3. Filtering based on logical operators
    4. Common expressions
    5. Polymerization pipeline
  5. Endnote

What is pymongo?

Pymongo is a python library that enables us to connect to mongodb. In addition, this is the most recommended method for mongodb and python.

In addition, we chose Python to interact with mongodb because it is one of the most commonly used and powerful languages in data science. Pymongo allows us to retrieve data using a dictionary like syntax.

If you are a beginner of python, I suggest you take this free course: getting started with Python.

Installation steps

Installing pymongo is straightforward. Here, I assume that you have Python 3 and mongodb installed. The following command will help you install pymongo:

pip3 install pymongo

Insert data into database

Now let’s set it up, and then use pymongo to query the mongodb database. First, we insert the data into the database. The following steps will help you

  1. Import the library and connect to the Mongo client

Start the mongodb server on the computer. I assume it is running the file localhost: 27017.

Let’s start importing some of the libraries we’ll use. By default, the mongodb server runs on port 27017 on the local computer. We will then connect to the mongodb client using the pymongo library.

Then get the database sample_ DB database instance. In case it doesn’t exist, mongodb will create one for you.

#Import the required libraries
import pymongo
import pprint
import json
import warnings
warnings.filterwarnings('ignore')

#Connect to mongoclient
client = pymongo.MongoClient('mongodb://localhost:27017')

#Get database
database = client['sample_db']
  1. Create a collection from a JSON file

We will use data from a delivery company operating in multiple cities. In addition, they have various distribution centers in these cities to send meal lists to their customers. You can download data and code here.

  1. weekly_demand

    • id: unique ID of each document
    • week: Week No
    • center_id: unique ID of the distribution center
    • meal_id: unique ID of the meal
    • checkout_price: final price, including discount, tax and delivery fee
    • base_price: basic price of meal
    • emailer_for_promotion: send email to facilitate meals
    • homepage_featured: meals provided on the front page
    • num_orders: (target) number of orders
  2. meal_info

    • meal_id: unique ID of the meal
    • category: meal type (beverage / snack / soup…)
    • cuisine: cuisine (India / Italy /…)

Then, we will be in sample_ Create two collections in the DB database:

#Create weekly requirements collection
database.create_collection("weekly_demand")

#Create meal information
database.create_collection("meal_info")

  1. Insert data into collection

Now, the data we have is in JSON format. Then, we will get an instance of the collection, read the data file, and useinsert_manyFunction to insert data.

#Get collection weekly_ demand
weekly_demand_collection = database.get_collection("weekly_demand")

#Open weekly_ Demand JSON file
with open("weekly_demand.json") as f:
    file_data = json.load(f)
#Insert data into collection
weekly_demand_collection.insert_many(file_data)

#Get total data points    
weekly_demand_collection.find().count()
# >> 456548

#Get favorite meals 
meal_info_collection = database.get_collection("meal_info")

#Open meat_ Info JSON file
with open("meal_info.json") as f:
    file_data = json.load(f)
    
#Insert data into collection
meal_info_collection.insert_many(file_data)

#Get total data points
meal_info_collection.find().count()
# >> 51

Finally, inweekly_demand_collectionThere are 456548 documents in and 51 documents in the meal information set. Now, let’s look at one document in each collection.

weekly_demand_collection

weekly_demand_collection.find_one()

Dietary information set

meal_info_collection.find_one()

Now our data is ready. Let’s continue to query the database.

query data base

We can use pymonfo with lookup function to query mongodb database to obtain all results that meet the given conditions, and we can also use find_ One function, which will return only one result that meets the condition.

Here are find and find_ Syntax of one:

your_collection.find( {<< query >>} , { << fields>>} )

You can use the following filtering techniques to query the database

  1. Filter by field

For example, you have hundreds of fields, and you only want to see a few of them. You can do this by setting all required field names to a value of 1. For example,

weekly_demand_collection.find_one( {}, { "week": 1, "checkout_price" : 1})

On the other hand, if you only want to discard some fields from the entire document, you can set the field name to equal 0. Therefore, only those fields will be excluded. Note that you cannot use a combination of 1 and 0 to get fields. Either all are one or all are zero.

weekly_demand_collection.find_one( {}, {"num_orders" : 0, "meal_id" : 0})

  1. Filter condition

Now, in this section, we will provide a condition in the first brace and delete the field in the second. Therefore, it will return to center_ ID equals 55 and meal_ The first document with ID equal to 1885, and the field will also be discarded_ ID and week.

weekly_demand_collection.find_one( {"center_id" : 55, "meal_id" : 1885}, {"_id" : 0, "week" : 0} )

  1. Filter by comparison operator

The following are 9 comparison operators in mongodb.

name describe
$eq It will match a value equal to the specified value.
$gt It will match a value greater than the specified value.
$gte It will match all values greater than or equal to the specified value
$in It will match any value specified in the array
$lt It will match all values less than the specified value
$lte It will match all values less than or equal to the specified value
$ne It will match all values that are not equal to the specified value
$nin It will not match any of the values specified in the array

Here are some examples of using these comparison operators

  1. Equal and not equal

We’ll find the center_ ID equals 55 and home page_ All documents with feature not equal to 0. Since we will use the find function, it will return the cursor of the command. In addition, use the for loop to traverse the query results.

result_1 = weekly_demand_collection.find({
    "center_id" : { "$eq" : 55},
    "homepage_featured" : { "$ne" : 0}
})

for i in result_1:
    print(i)

  1. In and out of the list

For example, you need to match one element with multiple elements. In this case, we can use the $in operator instead of using the $EQ operator multiple times. We’ll try to find the center_ All documents with ID 24 or 11.

result_2 = weekly_demand_collection.find({
    "center_id" : { "$in" : [ 24, 11] }
})

for i in result_2:
    print(i)

Then, we find all centers that do not exist in the specified list_ Document with ID. The following query will return to center_ All documents with id not 24 or 11.

result_3 = weekly_demand_collection.find({
    "center_id" : { "$nin" : [ 24, 11] }
})

for i in result_3:
    print(i)

  1. Less than and greater than

Now, let’s find center_ ID 55 and checkout_ All documents with a price greater than 100 and less than 200. To do this, use the following syntax

result_4 = weekly_demand_collection.find({
    "center_id" : 55,
    "checkout_price" : { "$lt" : 200, "$gt" : 100}
})

for i in result_4:
    print(i)

  1. Logical operator based filters
name describe
$and It connects the query statement with logic, and returns all documents that meet both conditions.
$not It reverses the results of the query and returns documents that do not match the query expression.
$nor It uses logic to join query clauses, and nor returns all documents that do not match the clauses.
$or It uses logic to join query clauses, and or returns all documents that match the conditions of any clause.

The following example illustrates the use of logical operators-

  1. And operator

The following query will return to center_ Documents with ID equal to 11 and meal number not equal to 1778. The subquery of the and operator appears in the list.

result_5 = weekly_demand_collection.find({
    "$and" : [{
                 "center_id" : { "$eq" : 11}
              },
              {
                   "meal_id" : { "$ne" : 1778}
              }]
})

for i in result_5:
    print(i)

  1. OR operator
    The following query will return to center_ All documents with ID equal to 11 or meal ID 1207 or 2707. In addition, the subquery of the or operator will be in the list.
result_6 = weekly_demand_collection.find({
    "$or" : [{
                 "center_id" : { "$eq" : 11}
              },
              {
                   "meal_id" : { "$in" : [1207, 2707]}
              }]
})

for i in result_6:
    print(i)

  1. Filter with regular expressions

Regular expressions are useful when you have text fields and want to search for documents with specific patterns. If you want to learn more about regular expressions, I strongly recommend that you read this article: Python regular expression beginner’s tutorial.

It can be used with the operator $regex, and we can provide values for the operator to change the regex pattern to matc. We will use the meal information set in this query, and then find the document starting with C in the food field.

result_7 = meal_info_collection.find({
    "cuisine" : { "$regex" : "^C" }
})

for i in result_7:
    print(i)

Let’s look at another example of a regular expression. We will find all documents whose categories begin with “s” and end with “Ian”.

result_8 = meal_info_collection.find({
    "$and" : [
        { 
            "category" : {
            "$regex" : "^S"
        }},
        {
            "cuisine" : {
                "$regex" : "ian$"
        }}
    ]
})

for i in result_8:
    print(i)

  1. Polymerization pipeline

Mongodb’s aggregation pipeline provides a framework for performing a series of data transformations on datasets. The following is its syntax:

your_collection.aggregate( [ {  }, {  },.. ] )

The first stage takes the complete document set as the input, and then each subsequent stage takes the result set of the previous transformation as the input and output of the next stage.

There are about 10 transformations available in the mongodb summary. In this article, we will see $match and $group. We will discuss each transformation in detail in an upcoming mongodb article.

For example, in the first phase, we will match center_ For documents with ID equal to 11, in the next stage, it will_ The number of documents with ID equal to 11 is counted. Notice that we have assigned a value to the $count operator, which is equal to the total in the second phase_ Rows, which is the name of the field we want to display in the output.

result_9 = weekly_demand_collection.aggregate([
    ## stage 1
    {
        "$match" : 
                 {"center_id" : {"$eq" : 11 } }
    },
    ## stage 2
    {
        "$count" : "total_rows"
    }
])

for i in result_9:
    print(i)

Now, let’s take another example. The first stage is the same as before, that is, center_ ID equals 11. In the second stage, we calculate the center_ Field num of ID 11_ Average of orders and center_ Unique meal for ID 11_ ids。

result_10 = weekly_demand_collection.aggregate([
    ## stage 1
    {
        "$match" : 
                 {"center_id" : {"$eq" : 11 } }
    },
    ## stage 2
    {
        "$group" : { "_id" : 0 ,
                     "average_num_orders": { "$avg" : "$num_orders"},
                     "unique_meal_id" : {"$addToSet" : "$meal_id"}} 
    }
])

for i in result_10:
    print(i)

Endnote

Today, the amount of data is incredible, so it is necessary to find a better alternative to query data. In summary, in this article, we learned how to query the mongodb database using pymongo. In addition, we learned how to apply various filters as needed.

If you want to know more about querying data, I suggest you take the following course – Structured Query Language (SQL) in data science

In the next article, we will discuss aggregation pipelines in detail.

Thanks for reading!

Original link:https://www.analyticsvidhya.com/blog/2020/08/query-a-mongodb-database-using-pymongo/

Welcome to panchuang AI blog:
http://panchuang.net/

Official Chinese document of sklearn machine learning:
http://sklearn123.com/

Welcome to panchuang blog resources summary station:
http://docs.panchuang.net/