Mongodb pipeline operator (I)

Time:2021-11-24

Friends familiar with Linux operating system should know that there is a pipeline in Linux, which can be used to process data conveniently. Mongodb2.2 also introduces a new data aggregation framework. A document can pass through a pipeline composed of multiple nodes. Each node has its own special functions, such as document grouping and document filtering. Each node will accept a series of documents, make some type conversion on these documents, and then transfer the converted documents to the next node, The last node will return the result to the client. In this article, let’s first look at some basic pipeline operators.

This is the twelfth article in mongodb series. Understanding the previous articles will help you better understand this article:


1.Installing mongodb on Linux
2.Mongodb basic operation
3.Mongodb data type
4.Mongodb document update operation
5.Mongodb document query operation (I)
6.Mongodb document query operation (II)
7.Mongodb document query operation (III)
8.Mongodb view execution plan
9.Getting to know the index in mongodb
10.Various types of indexes in mongodb
11.Mongodb fixed set


$match

$match can be used to filter documents. After filtering, we can do data aggregation on the subset of documents obtained by filtering. The query operators we introduced earlier can be used in $match, for example, to obtain all documents in the collection whose author is “Du Fu”, as follows:

db.sang_ Collect. Aggregate ({$match: {Author: "Du Fu"}})

When we actually use it, we’d better put $match in front of the pipeline, which can reduce the workload of the pipeline behind. At the same time, we can also use indexes to execute $match before projection and grouping.

$project

Basic Usage

$project can be used toExtract the desired field, as follows:

db.sang_collect.aggregate({$project:{title:1,_id:0}})

1 means you want to this field, 0 means you don’t want this field, or you can modify the returned fieldrenameFor example, change the title to articletitle as follows:

db.sang_collect.aggregate({$project:{"articleTitle":"$title"}})

However, there is a problem to note here. If there is an index on the original field, there will be no index on the renamed field, so it is best to use the index before renaming.

mathematical expression

Mathematical expressions can be used to add, subtract, multiply and divide a group of values to obtain modules. For example, my data structure is as follows:

{
    "_id" : ObjectId("59f841f5b998d8acc7d08863"),
    "orderAddressL" : "ShenZhen",
    "prodMoney" : 45.0,
    "freight" : 13.0,
    "discounts" : 3.0,
    "orderDate" : ISODate("2017-10-31T09:27:17.342Z"),
    "prods" : [ 
        "Coke", 
        "Milk tea"
    ]
}

The total cost of the order is commodity cost plus freight. The query is as follows:

db.sang_collect.aggregate({$project:{totalMoney:{$add:["$prodMoney","$freight"]}}})

The cost of actual payment is the total cost minus discount, as follows:

db.sang_collect.aggregate({$project:{totalPay:{$subtract:[{$add:["$prodMoney","$freight"]},"$discounts"]}}})

Another three nonsense operations, such as calculating the product of prodmoney, freedom and counts:

db.sang_collect.aggregate({$project:{test1:{$multiply:["$prodMoney","$freight","$discounts"]}}})

Another example is to find the quotient of $prodmoney and $freight, as follows:

db.sang_collect.aggregate({$project:{test1:{$divide:["$prodMoney","$freight"]}}})

Another example is to use $freight to model $prodmoney, as follows:

db.sang_collect.aggregate({$project:{test1:{$mod:["$prodMoney","$freight"]}}})

Both addition and multiplication can receive multiple parameters, and the rest can receive two parameters.

date expression

Date expression can extract year, month, day, week, hour, minute, second and other information from a date type, as follows:

db.sang_ Collecting a collection of points is a great way to collect from a collection. There's a way to collect a collection. There's a way to collect a collection from a collection of {$year: a year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's year's date, and month's month's month's month '{$OrderDate's date, where the month's day's month's month's day's month's day's month's month' {{{{{{{{{{$$$$$$$$$OrderDate {$$$$"}," Ms ": {$millisecond: "$OrderDate"}, "custom format time": {$datetostring: {format: "% y year% m month% d% H:% m:% s", date: "$OrderDate"})

The results are as follows:

{
    "_id" : ObjectId("59f841f5b998d8acc7d08861"),
    "Year": 2017,
    "Month": October,
    "The week of the year": 44,
    "Date": 31,
    "Week": 3,
    "Day of the year": 304,
    "Hour": 9,
    "Points": 27,
    "Second": 17,
    "Ms": 342,
    "Custom formatting time": "October 31, 2017 09:27:17"
}

$DayOfWeek returns the week, 1 means Sunday, 7 means Saturday, and $week means the week of the year, starting from 0$ Datetostring is a function in mongodb 3.0 +. There are also several formatted characters:

character meaning Value range
%Y Year (4 digits, zero padded) 0000-9999
%m Month (2 digits, zero padded) 01-12
%d Day of Month (2 digits, zero padded) 01-31
%H Hour (2 digits, zero padded, 24-hour clock) 00-23
%M Minute (2 digits, zero padded) 00-59
%S Second (2 digits, zero padded) 00-60
%L Millisecond (3 digits, zero padded) 000-999
%j Day of year (3 digits, zero padded) 001-366
%w Day of week (1-Sunday, 7-Saturday) 1-7
%U Week of year (2 digits, zero padded) 00-53

String expression

There are operations such as string interception, splicing, capitalization and lowercase conversion in the string expression. For example, I intercept the first two characters of orderaddressl and return, as follows:

db.sang_collect.aggregate({$project:{addr:{$substr:["$orderAddressL",0,2]}}})

For another example, I spliced orderaddressl and OrderDate and returned:

db.sang_ Collect. Aggregate ({$project: {addr: {$concat: ["$orderaddressl", {$datetostring: {format: "--% y year% m month% d", date: "$OrderDate"}}]}})

The results are as follows:

{
    "_id" : ObjectId("59f841f5b998d8acc7d08861"),
    "Addr": "Nanjing -- October 31, 2017"
}

For another example, I convert orderaddressl to lowercase and return:

db.sang_collect.aggregate({$project:{addr:{$toLower:"$orderAddressL"}}})

For another example, I convert orderaddressl to uppercase and return:

db.sang_collect.aggregate({$project:{addr:{$toUpper:"$orderAddressL"}}})

Logical expression

To compare the size of two numbers, use the $CMP operator as follows:

db.sang_collect.aggregate({$project:{test:{$cmp:["$freight","$discounts"]}}})

If the first parameter is greater than the second parameter, it returns a positive number, and if the first parameter is less than the second parameter, it returns a negative number. You can also use $strcasecmp to compare strings (Chinese is invalid):

db.sang_ Collect. Aggregate ({$project: {test: {$strcasecmp: [{$datetostring: {format: "..% y year% m month% d", date: "$OrderDate"}}, "$orderaddressl"]}})

As for the operators such as $EQ / $Ne / $GT / $GTE / $LT / $LTE introduced earlier, they are also applicable here. In addition, $and, $or, $not and other expressions are available. Take $and as an example, as follows:

db.sang_collect.aggregate({$project:{test:{$and:[{"$eq":["$freight","$prodMoney"]},{"$eq":["$freight","$discounts"]}]}}})

Return true when each parameter in $and is true, $or returns true when one of the parameters is true, $not reverses the value of its parameters, as follows:

db.sang_collect.aggregate({$project:{test:{$not:{"$eq":["$freight","$prodMoney"]}}}})

There are also two process control statements, as follows:

db.sang_collect.aggregate({$project:{test:{$cond:[false,"trueExpr","falseExpr"]}}})

If the first parameter of $cond is true, trueexpr will be returned; otherwise, false expr will be returned

db.sang_collect.aggregate({$project:{test:{$ifNull:[null,"replacementExpr"]}}})

$ifnull if the first parameter is null, replacementexpr will be returned; otherwise, the first parameter will be returned.

OK, let’s talk about the pipeline operator in mongodb first. The next article continues. If you have any questions, please leave a message for discussion.

reference material:

1. Authoritative guide to mongodb, 2nd Edition
2.Mongodb aggregation utilizes date grouping

More information on the official account:

Mongodb pipeline operator (I)

Recommended Today

Apache sqoop

Source: dark horse big data 1.png From the standpoint of Apache, data flow can be divided into data import and export: Import: data import. RDBMS—–>Hadoop Export: data export. Hadoop—->RDBMS 1.2 sqoop installation The prerequisite for installing sqoop is that you already have a Java and Hadoop environment. Latest stable version: 1.4.6 Download the sqoop installation […]