Mongodb advanced aggregate query

Time:2021-6-12

 

https://www.cnblogs.com/zhoujie/p/mongo1.html

Mongodb advanced aggregate query

 

In my work, I often encounter some mongodb aggregation operations. Here is a summary. Mongo can store complex types, such as arrays, objects and other document structures that MySQL is not good at handling, and the operation of aggregation is much more complex than mysql.

Note: This paper is based on mongodb V3.6

catalogue

  • Comparison of Mongo and MySQL aggregation
  • About aggregate
  • Aggregate syntax
  • Introduction to the commonly used pipeline stage of aggregate (the core of this article)
  • Node operation Mongo aggregate query (the core of this paper)

Comparison of Mongo and MySQL aggregation

In order to make it easy to understand, we first make the following analogy between the common Mongo aggregation operation and the MySQL query

SQL operation / function Mongodb aggregation operation
where $match
group by $group
having $match
select $project
order by $sort
limit  $limit
sum() $sum
count() $sum
join

$lookup  

(new in v3.2)

 

 

 

 

 

 

 

 

 

 

 

Here are some common Mongo aggregation examples compared with MySQL. Suppose there is a database record (Table Name: orders) as an example:

{
  cust_id: "abc123",
  ord_date: ISODate("2012-11-02T17:04:11.102Z"),
  status: 'A',
  price: 50,
  items: [ { sku: "xxx", qty: 25, price: 1 },
           { sku: "yyy", qty: 25, price: 1 } ]
}

1. Count all records in orders table

db.orders.aggregate( [
   {
     $group: {
        _id: null,
        count: { $sum: 1 }
     }
   }
] )

Similar to MySQL:
SELECT COUNT(*) AS count   FROM orders

2. Sum all prices in the orders table

db.orders.aggregate( [
   {
     $group: {
        _id: null,
        total: { $sum: "$price" }
     }
   }
] )

Similar to MySQL;
SELECT SUM(price) AS total  FROM orders

3. For each unique cust_ ID to calculate the sum of prices

db.orders.aggregate( [
   {
     $group: {
        _id: "$cust_id",
        total: { $sum: "$price" }
     }
   }
] )

Similar to MySQL:
SELECT cust_id,
       SUM(price) AS total
FROM orders
GROUP BY cust_id

4. For each unique pair cust_ ID and ord_ Group date and calculate the total price, excluding the time part of the date

db.orders.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$cust_id",
           ord_date: {
               month: { $month: "$ord_date" },
               day: { $dayOfMonth: "$ord_date" },
               year: { $year: "$ord_date"}
           }
        },
        total: { $sum: "$price" }
     }
   }
] )

Similar to MySQL:
SELECT cust_id,
       ord_date,
       SUM(price) AS total
FROM orders
GROUP BY cust_id,
         ord_date

5. For cust with multiple records_ ID, return cust_ ID and corresponding quantity

db.orders.aggregate( [
   {
     $group: {
        _id: "$cust_id",
        count: { $sum: 1 }
     }
   },
   { $match: { count: { $gt: 1 } } }
] )

Similar to MySQL:
SELECT cust_id,
       count(*)
FROM orders
GROUP BY cust_id
HAVING count(*) > 1

6. For each unique cust_ ID and ord_ Group date, calculate the total price, return only the records whose total price is greater than 250, and exclude the time part of the date

db.orders.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$cust_id",
           ord_date: {
               month: { $month: "$ord_date" },
               day: { $dayOfMonth: "$ord_date" },
               year: { $year: "$ord_date"}
           }
        },
        total: { $sum: "$price" }
     }
   },
   { $match: { total: { $gt: 250 } } }
] )

Similar to MySQL:
SELECT cust_id,
       ord_date,
       SUM(price) AS total
FROM orders
GROUP BY cust_id,
         ord_date
HAVING total > 250

7. For each unique cust_ If Id and status = a, the total price is calculated

db.orders.aggregate( [
   { $match: { status: 'A' } },
   {
     $group: {
        _id: "$cust_id",
        total: { $sum: "$price" }
     }
   }
] )

Similar to MySQL:
SELECT cust_id,
       SUM(price) as total
FROM orders
WHERE status = 'A'
GROUP BY cust_id

8. For each unique cust_ If Id and status = a, the total price is calculated and only records whose total price is greater than 250 are returned

db.orders.aggregate( [
   { $match: { status: 'A' } },
   {
     $group: {
        _id: "$cust_id",
        total: { $sum: "$price" }
     }
   },
   { $match: { total: { $gt: 250 } } }
] )

Similar to MySQL:
SELECT cust_id,
       SUM(price) as total
FROM orders
WHERE status = 'A'
GROUP BY cust_id
HAVING total > 250

9. For each unique cust_ ID, the corresponding order item associated with orders_ The quantity field of lineitem

db.orders.aggregate( [
   { $unwind: "$items" },
   {
     $group: {
        _id: "$cust_id",
        qty: { $sum: "$items.qty" }
     }
   }
] )

Similar to MySQL:
SELECT cust_id,
       SUM(li.qty) as qty
FROM orders o,
     order_lineitem li
WHERE li.order_id = o.id
GROUP BY cust_id

10. Statistics of different custs_ ID and ord_ The number of date groups, excluding the time part of the date

db.orders.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$cust_id",
           ord_date: {
               month: { $month: "$ord_date" },
               day: { $dayOfMonth: "$ord_date" },
               year: { $year: "$ord_date"}
           }
        }
     }
   },
   {
     $group: {
        _id: null,
        count: { $sum: 1 }
     }
   }
] )

Similar to MySQL:
SELECT COUNT(*)
FROM (SELECT cust_id, ord_date
      FROM orders
      GROUP BY cust_id, ord_date)
      as DerivedTable

About aggregate

DB. Collection. Aggregate () is an aggregation pipeline based on data processing. Each document passes through a pipeline composed of multiple stages. It can group and filter the pipelines of each stage, and then output the corresponding results after a series of processing.

From this diagram, you can understand the process of aggregate processing.

1. DB. Collection. Aggregate() can create a pipeline with multiple components to process a series of documents. These components include: match of filtering operation, project of mapping operation, group of grouping operation, sort of sorting operation, limit of limiting operation and skip of skipping operation.
2. DB. Collection. Aggregate () uses mongodb’s built-in native operation, which has very high aggregation efficiency. It supports functions similar to SQL group by operation, and does not need users to write custom JavaScript routines.
3. The pipeline is limited to 100MB of memory per phase. If a node pipeline exceeds this limit, mongodb will generate an error. In order to be able to process large data sets, you can set allowdiskuse to true to write data to temporary files in the aggregation pipeline node. This can solve the 100MB memory limit.
4. DB. Collection. Aggregate() can act on the partitioned set, but the result cannot be input in the partitioned set. MapReduce can act on the partitioned set, and the result can also be input in the partitioned set.
5. The DB. Collection. Aggregate () method can return a cursor and put the data in memory for direct operation. Like Mongo shell, pointer operation.
6. The output result of DB. Collection. Aggregate() can only be saved in one document, and the size of bson document is limited to 16m. You can solve this problem by returning a pointer. In version 2.6, the DB. Collect. Aggregate () method returns a pointer, which can return the size of any result set.

Aggregate syntax

Basic format:

db.collection.aggregate(pipeline, options)

Parameter Description:

parameter type describe
pipeline array

A series of data aggregation operations or phases. SeeAggregate pipeline operators
Changed in version 2.6: the pipeline phase can still be accepted as a separate parameter instead of as an element in an array; However, if you do not specify a pipe as an array, you cannot specify the options parameter

options document  

Optional. Other options that aggregate () passes to the aggregate command.
What’s new in version 2.6: available only when you specify a pipe as an array.

be careful:

Using dB. Collection. Aggregate() to query directly will prompt an error, but passing an empty array, such as dB. Collection. Aggregate ([]), will not report an error, and will return all documents like find.

 

There are many stages in the pipeline, but here I only record a few that I often use, and I will add if I use them later. See theOfficial website

Next, we will introduce these common stages

$count , $group,  $match, $project,  $unwind, $limit, $skip,  $sort, $sortByCount,  $lookup, $out, $addFields

Introduction to the common pipeline stage of aggregate

$count

interpretation:

Returns the count of documents that contain input to the stage, which is understood as the count of documents that match the find() query of the table or view.

The DB. Collection. Count() method does not perform the find() operation, but counts and returns the number of results that match the query.

Grammar:

{ $count:  }

The $count phase is equivalent to the following sequence of $group + $project:

db.collection.aggregate( [
   { $group: { _ ID: null, MYCOUNT: {$sum: 1}}}, where MYCOUNT is customized, which is equivalent to select count (*) as MYCOUNT in MySQL
   { $project: { _ ID: 0}} # return not displayed_ ID field
] )

give an example:

Sample data:

{ "_id" : 1, "subject" : "History", "score" : 88 }
{ "_id" : 2, "subject" : "History", "score" : 92 }
{ "_id" : 3, "subject" : "History", "score" : 97 }
{ "_id" : 4, "subject" : "History", "score" : 71 }
{ "_id" : 5, "subject" : "History", "score" : 79 }
{ "_id" : 6, "subject" : "History", "score" : 83 }

Implementation:

1) The $match phase excludes documents with score less than or equal to 80 and passes documents with score greater than or equal to 80 to the next phase

2) The $count phase returns the count of the remaining documents in the aggregation pipeline and assigns the value to a file named passing_ The field of scores.

Implementation results:

$group

interpretation:

The documents are grouped according to the specified expression, and the documents of each different grouping are output to the next stage. The output document contains a_ ID field, which contains different groups.

The output document can also contain a calculated field, which is saved by the_ The ID field groups the values of some accumulator expressions$ Group does not output specific documents, but just statistics.

Grammar:

{ $group: { _id: , : {  :  }, ... } }
  • _ The ID field is required; However, you can specify_ The ID value is null to calculate the cumulative value for the entire input document.
  • The remaining calculated fields are optional and use theOperator.
  • _ ID andAn expression can accept any validexpression

Accumulator operator

name describe Analogy SQL
$avg Calculate the mean value avg
$first Return the first document of each group. If it is sorted, it will be sorted. If it is not sorted, it will be the first document in the default stored order. limit 0,1
$last Returns the last document of each group. If there is a sort, it will be sorted. If there is no default stored order, it will be the last document.
$max According to the grouping, get the maximum value of all documents in the collection. max
$min According to the grouping, get the minimum value of the corresponding value of all documents in the collection. min
$push Adds the value of the specified expression to an array.
$addToSet  Adds the value of an expression to a collection (no duplicate values, out of order).
$sum Calculate the sum sum
$stdDevPop Returns the population standard deviation of the input value
$stdDevSamp Returns the sample standard deviation of the input value

The memory limit of the $group phase is 100m. By default, if the stage exceeds this limit, $group will generate an error. However, to allow processing of large datasets, set the allowdiskuse option to true to enable the $group operation to write temporary files.

Friendly remarks:

  • “$addtoset”: expr. If the current array does not contain expr, add it to the array.
  • “$push”: expr, whatever expr is, is added to the array. Returns an array of all values.

Changes have been made in version 2.6: for the $group phase, mongodb introduces a 100m memory limit and the allow disk use option to handle large data set operations.

give an example:

Sample data:

{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-03-01T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-03-01T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-03-15T09:00:00Z") }
{ "_id" : 4, "item" : "xyz", "price" : 5, "quantity" : 20, "date" : ISODate("2014-04-04T11:21:39.736Z") }
{ "_id" : 5, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-04-04T21:23:13.331Z") }

one   The following summary operations use the $group stage to group documents by month, date and year, calculate the total price and average quantity, and calculate the number of documents in each group:

return:

/* 1 */
{
    "_id" : {
        "month" : 4,
        "day" : 4,
        "year" : 2014
    },
    "totalPrice" : 200,
    "averageQuantity" : 15.0,
    "count" : 2.0
}

/* 2 */
{
    "_id" : {
        "month" : 3,
        "day" : 15,
        "year" : 2014
    },
    "totalPrice" : 50,
    "averageQuantity" : 10.0,
    "count" : 1.0
}

/* 3 */
{
    "_id" : {
        "month" : 3,
        "day" : 1,
        "year" : 2014
    },
    "totalPrice" : 40,
    "averageQuantity" : 1.5,
    "count" : 2.0
}

2. group null  , The following aggregation operations specify the group_ If Id is null, calculate the total price, average quantity and count of all documents in the collection

3. Query distinct values

The following rollup uses the $group phase to group documents by item to retrieve different item values:

4. Data conversion

1) Group the data in the collection by price and convert it into an item array

The returned data ID value is the field specified in the group. Items can be customized and is a grouped list

2) Next, aggregate the operating utility variable $$root to group documents by item,The generated document cannot exceed the bson document size limit.

return:

/* 1 */
{
    "_id" : "xyz",
    "books" : [ 
        {
            "_id" : 3,
            "item" : "xyz",
            "price" : 5,
            "quantity" : 10,
            "date" : ISODate("2014-03-15T09:00:00.000Z")
        }, 
        {
            "_id" : 4,
            "item" : "xyz",
            "price" : 5,
            "quantity" : 20,
            "date" : ISODate("2014-04-04T11:21:39.736Z")
        }
    ]
}

/* 2 */
{
    "_id" : "jkl",
    "books" : [ 
        {
            "_id" : 2,
            "item" : "jkl",
            "price" : 20,
            "quantity" : 1,
            "date" : ISODate("2014-03-01T09:00:00.000Z")
        }
    ]
}

/* 3 */
{
    "_id" : "abc",
    "books" : [ 
        {
            "_id" : 1,
            "item" : "abc",
            "price" : 10,
            "quantity" : 2,
            "date" : ISODate("2014-03-01T08:00:00.000Z")
        }, 
        {
            "_id" : 5,
            "item" : "abc",
            "price" : 10,
            "quantity" : 10,
            "date" : ISODate("2014-04-04T21:23:13.331Z")
        }
    ]
}

$match

interpretation:

Filter the documents, and only the documents that meet the specified conditions are passed to the next pipeline stage.
$match accepts a document with specified query criteria. The query syntax is the same as the read operation query syntax.

Grammar:

{ $match: {  } }

Pipeline optimization:

$match is used to filter documents, and then aggregate them on a subset of documents. It can use all conventional query operators except geospatial,In practical application, try to put $match in front of the pipeline. This has two advantages: one is that you can quickly filter out unnecessary documents to save timeReduce pipeline workload; Second, if $match is executed before projection and grouping,Queries can use indexes

Restrictions:

  • You cannot use $as part of an aggregate pipeline in a $match query.
  • To use $text in the $match phase, the $match phase must be the first phase of the pipeline.
  • View does not support text search.

give an example:

Sample data:

{ "_id" : ObjectId("512bc95fe835e68f199c8686"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("512bc962e835e68f199c8687"), "author" : "dave", "score" : 85, "views" : 521 }
{ "_id" : ObjectId("55f5a192d4bede9ac365b257"), "author" : "ahn", "score" : 60, "views" : 1000 }
{ "_id" : ObjectId("55f5a192d4bede9ac365b258"), "author" : "li", "score" : 55, "views" : 5000 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b259"), "author" : "annT", "score" : 60, "views" : 50 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b25a"), "author" : "li", "score" : 94, "views" : 999 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b25b"), "author" : "ty", "score" : 95, "views" : 1000 }

1. Use $match to do simple matching query

return:

/* 1 */
{
    "_id" : ObjectId("512bc95fe835e68f199c8686"),
    "author" : "dave",
    "score" : 80,
    "views" : 100
}

/* 2 */
{
    "_id" : ObjectId("512bc962e835e68f199c8687"),
    "author" : "dave",
    "score" : 85,
    "views" : 521
}

2. Use the $match pipeline to select the documents to be processed, and then output the results to the $group pipeline to calculate the document count:

return:

/* 1 */
{
    "_id" : null,
    "count" : 5.0
}

$unwind

interpretation:

Deconstructs the array field from the input document to output the document for each element. In short, it is   You can split an array into separate documents.

Grammar:

{ $unwind:  }

To specify the field path, precede the field name with the $sign and enclose it in quotation marks.

V3.2 + supports the following syntax:

{
  $unwind:
    {
      path: ,
      Includearrayindex:, # optional, the name of a new field is used to store the array index of the element. The name cannot start with $.
Preservenullandemptyarrays: # optional, default: false, if true, if the path is empty, missing or empty array, then $unwind outputs the document
} }

If you specify a path for a field that does not exist in the input document, or if the field is an empty array, $unwind ignores the input document by default and does not output the document of the input document.

New feature in version 3.2: to output documents with missing array fields, null or empty arrays, use the option preserve null and empty arrays.

give an example:

1. Sample data 1:

{ "_id" : 1, "item" : "ABC1", sizes: [ "S", "M", "L"] }

The following aggregation uses $unwind to output a document for each element in the sizes array:

db.getCollection('test').aggregate(
 [ { $unwind : "$sizes" } ]
)

return:

{ “_id” : 1, “item” : “ABC1”, “sizes” : “S” }
{ “_id” : 1, “item” : “ABC1”, “sizes” : “M” }
{ “_id” : 1, “item” : “ABC1”, “sizes” : “L” }

Each document is the same as the input document, except that the values of the sizes field are the values of the original sizes array.

2. Take the following example data:

{ "_id" : 1, "item" : "ABC", "sizes": [ "S", "M", "L"] }
{ "_id" : 2, "item" : "EFG", "sizes" : [ ] }
{ "_id" : 3, "item" : "IJK", "sizes": "M" }
{ "_id" : 4, "item" : "LMN" }
{ "_id" : 5, "item" : "XYZ", "sizes" : null }

1) The following $unwind operation uses the include array index option to output the array index of an array element.

db.getCollection('test').aggregate( [ { $unwind: { path: "$sizes", includeArrayIndex: "arrayIndex" } } ] )

return:

{ “_id” : 1, “item” : “ABC”, “sizes” : “S”, “arrayIndex” : NumberLong(0) }
{ “_id” : 1, “item” : “ABC”, “sizes” : “M”, “arrayIndex” : NumberLong(1) }
{ “_id” : 1, “item” : “ABC”, “sizes” : “L”, “arrayIndex” : NumberLong(2) }
{ “_id” : 3, “item” : “IJK”, “sizes” : “M”, “arrayIndex” : null }

2) The following $unwind operation uses the preserve null and empty arrays option to include documents in the output that are missing the size field, null, or empty array.

db.inventory.aggregate( [
   { $unwind: { path: "$sizes", preserveNullAndEmptyArrays: true } }
] )

return:
{ "_id" : 1, "item" : "ABC", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC", "sizes" : "L" }
{ "_id" : 2, "item" : "EFG" }
{ "_id" : 3, "item" : "IJK", "sizes" : "M" }
{ "_id" : 4, "item" : "LMN" }
{ "_id" : 5, "item" : "XYZ", "sizes" : null }

$project

interpretation:

$project can select the desired and unwanted fields from the document(The specified field can be an existing field from an input document or a new calculated field

)It can also perform some complex operations through pipeline expressions, such as mathematical operations, date operations, string operations, and logical operations.

Grammar:

{ $project: {  } }

The $project pipeline character is used to select fields (specify fields, add fields, and do not display fields)_ ID: 0, exclude fields, etc.), rename fields, derive fields.

Specifications come in the following forms:

<1 or true>     Whether to include the field, field:1/0 , indicating select / not select field

_id: <0 or false>         Specify_ ID field

  Add a new field or reset the value of an existing field. Change in version 3.6: mongodb 3.6 adds the variable remove. If the expression evaluates to $$remove, the field is excluded from the output.

:<0 or false>     V3.4 new function, specify exclusion field

  • By default_ The ID field is included in the output document. To include any other fields in the input document in the output document, the include in $project must be explicitly specified. If you specify to include a field that does not exist in the document, $project ignores the field inclusion and does not add the field to the document.
  • By default_ The ID field is included in the output document. To exclude from the output document_ ID field, which must be explicitly specified in $project_ The ID field is 0.
  • What’s new in v3.4 – if you specify to exclude one or more fields, all other fields are returned in the output document. If exclusion is specified_ If you exclude a field, you cannot specify the containing field, reset the value of an existing field, or add a new field. This restriction does not apply to conditional exclusion fields using the remove variable.
  • New feature in V3.6 – starting with mongodb 3.6, you can use the variable remove in aggregate expressions to conditionally disable a field.
  • To add a new field or reset the value of an existing field, specify the field name and set its value to an expression.
  • To set the field value directly to a number or Boolean text, rather than to an expression that resolves to text, use the $literal operator. Otherwise, the $project treats a number or Boolean text as a flag to include or exclude the field.
  • You can effectively rename a field by specifying a new field and setting its value to the field path of an existing field.
  • Starting from mongodb 3.2, the $project phase supports creating new array fields directly using square brackets. If the array specification contains a field that does not exist in the document, the operation replaces the null value with the value of the field.
  • Changed in version 3.4-If $project is an empty document, mongodb 3.4 and later will generate an error.
  • You can use point symbols when projecting or adding / resetting fields embedded in a document. For example:
"contact.address.country": <1 or 0 or expression>
or
contact: { address: { country: <1 or 0 or expression> } }

give an example:

Sample data:

{
  "_id" : 1,
  title: "abc123",
  isbn: "0001122223334",
  author: { last: "zzz", first: "aaa" },
  copies: 5,
  lastModified: "2016-07-28"
}

one   The output document of the following $project phase contains only_ ID, title and author fields:

db.getCollection('test').aggregate( [ { $project : { title : 1 , author : 1 } } ] )

return:

{ "_id" : 1, "title" : "abc123", "author" : { "last" : "zzz", "first" : "aaa" } }

2. _ The ID field is included by default. To exclude from the output document of the $project phase_ In the project document, enter the ID field_ The ID field is set to 0 to specify the exclusion_ ID field.

db.getCollection('test').aggregate( [ { $project : { _id: 0, title : 1 , author : 1 } } ] )

return:

{ "title" : "abc123", "author" : { "last" : "zzz", "first" : "aaa" } }

3. The following $project phase excludes the LastModified field from the output:

db.getCollection('test').aggregate( [ { $project : { "lastModified": 0 } } ] )

4.Exclude fields from nested documents, In the $project phase, the author.first and LastModified fields are excluded from the output:

db.test.aggregate( [ { $project : { "author.first" : 0, "lastModified" : 0 } } ] )

Alternatively, you can nest exclusion specifications in a document:

db.test.aggregate( [ { $project: { "author": { "first": 0}, "lastModified" : 0 } } ] )

return:

{
   "_id" : 1,
   "title" : "abc123",
   "isbn" : "0001122223334",
   "author" : {
      "last" : "zzz"
   },
   "copies" : 5,
}

New features in version 3.6.Starting with mongodb 3.6, you can use the variable remove in aggregate expressions to conditionally disable a field.

Sample data:

{
  "_id" : 1,
  title: "abc123",
  isbn: "0001122223334",
  author: { last: "zzz", first: "aaa" },
  copies: 5,
  lastModified: "2016-07-28"
}
{
  "_id" : 2,
  title: "Baked Goods",
  isbn: "9999999999999",
  author: { last: "xyz", first: "abc", middle: "" },
  copies: 2,
  lastModified: "2017-07-21"
}
{
  "_id" : 3,
  title: "Ice Cream Cakes",
  isbn: "8888888888888",
  author: { last: "xyz", first: "abc", middle: "mmm" },
  copies: 5,
  lastModified: "2017-07-22"
}

5. The following $project phase uses the remove variable to exclude the author. Middle field if it is equal to ”:

db.books.aggregate( [
   {
      $project: {
         title: 1,
         "author.first": 1,
         "author.last" : 1,
         "author.middle": {
            $cond: {
               if: { $eq: [ "", "$author.middle" ] },
               then: "$$REMOVE",
               else: "$author.middle"
            }
         }
      }
   }
] )

return:

{ "_id" : 1, "title" : "abc123", "author" : { "last" : "zzz", "first" : "aaa" } }
{ "_id" : 2, "title" : "Baked Goods", "author" : { "last" : "xyz", "first" : "abc" } }
{ "_id" : 3, "title" : "Ice Cream Cakes", "author" : { "last" : "xyz", "first" : "abc", "middle" : "mmm" } }

Contains the specified fields from the embedded document(the results only return fields that contain nested documents, and of course, fields that contain nested documents_ id)

Sample document:

{ _id: 1, user: "1234", stop: { title: "book1", author: "xyz", page: 32 } }
{ _id: 2, user: "7890", stop: [ { title: "book2", author: "abc", page: 5 }, { title: "book3", author: "ijk", page: 100 } ] }

Only the title field in the stop field is returned:

db.bookmarks.aggregate( [ { $project: { "stop.title": 1 } } ] )

or
db.bookmarks.aggregate( [ { $project: { stop: { title: 1 } } } ] )

return:

{ "_id" : 1, "stop" : { "title" : "book1" } }
{ "_id" : 2, "stop" : [ { "title" : "book2" }, { "title" : "book3" } ] }

Include calculated fields

Sample data:

{
  "_id" : 1,
  title: "abc123",
  isbn: "0001122223334",
  author: { last: "zzz", first: "aaa" },
  copies: 5
}

ISBN, LastName and copiesold are added to the return field

db.books.aggregate(
   [
      {
         $project: {
            title: 1,
            isbn: {
               prefix: { $substr: [ "$isbn", 0, 3 ] },
               group: { $substr: [ "$isbn", 3, 2 ] },
               publisher: { $substr: [ "$isbn", 5, 4 ] },
               title: { $substr: [ "$isbn", 9, 3 ] },
               checkDigit: { $substr: [ "$isbn", 12, 1] }
            },
            lastName: "$author.last",
            copiesSold: "$copies"
         }
      }
   ]
)

Return result of the above execution:

{
   "_id" : 1,
   "title" : "abc123",
   "isbn" : {
      "prefix" : "000",
      "group" : "11",
      "publisher" : "2222",
      "title" : "333",
      "checkDigit" : "4"
   },
   "lastName" : "zzz",
   "copiesSold" : 5
}

Project a new array field

Sample data:

{ "_id" : ObjectId("55ad167f320c6be244eb3b95"), "x" : 1, "y" : 1 }

The following aggregation operation returns the new array field myArray:

db.collection.aggregate( [ { $project: { myArray: [ "$x", "$y" ] } } ] )

return:

{ "_id" : ObjectId("55ad167f320c6be244eb3b95"), "myArray" : [ 1, 1 ] }

If the returned array contains fields that do not exist, null will be returned:

db.collection.aggregate( [ { $project: { myArray: [ "$x", "$y", "$someField" ] } } ] )

return:

{ "_id" : ObjectId("55ad167f320c6be244eb3b95"), "myArray" : [ 1, 1, null ] }

$limit

Limit the number of documents passed to the next stage in the pipeline

Grammar:

{ $limit:  }

Example:

db.article.aggregate(
    { $limit : 5 }
);

This operation returns only the first five documents passed to it by the pipeline$ Limit has no effect on the content of the document it passes.

be careful:

When $sort appears immediately before the $limit in the pipeline, the $sort operation will only maintain the first n results in the process, where n is the specified limit, while mongodb only needs to store n items in memory. This optimization is still applicable when allowdiskuse is true and n items exceed the aggregate memory limit.

$skip

Skip the specified number of documents entering the stage and pass the remaining documents to the next stage in the pipeline

Grammar:

{ $skip:  }

Example:

db.article.aggregate(
    { $skip : 5 }
);

This action skips the first five documents passed to it by the pipeline$ Skip has no effect on the content of the document passed along the pipeline.

$sort

Sort all input documents and return them to the pipeline in sort order.

Grammar:

{ $sort: { : , :  ... } }

$sort specifies the fields to sort and the documents in the corresponding sort order.Can have one of the following values:

  • 1 specifies the ascending order.
  • -1 specifies descending order.
  • {$meta: “textscore”} sorts the calculated textscore metadata in descending order.

Example:

To sort fields, set the sort order to 1 or – 1 to specify ascending or descending sort, respectively, as shown in the following example:

db.users.aggregate(
   [
     { $sort : { age : -1, posts: 1 } }
   ]
)

When comparing values of different bson types, mongodb uses the following comparison order, from lowest to highest:

 1 MinKey (internal type)
 2 Null
 3 Numbers (ints, longs, doubles, decimals)
 4 Symbol, String
 5 Object
 6 Array
 7 BinData
 8 ObjectId
 9 Boolean
10 Date
11 Timestamp
12 Regular Expression
13 MaxKey (internal type)

$sortByCount

New in v3.4.Groups incoming documents according to the value of the specified expression, and then calculates the number of documents in each different group.Each output document contains two fields: fields with different grouping values_ The ID field and the count field containing the number of documents belonging to the group or category are arranged in descending order.

Grammar:

{ $sortByCount:   }

 

 

reference:

https://www.jianshu.com/p/e60d5cfbeb35

https://www.cnblogs.com/ljhdo/p/5019837.html

 

,

In my work, I often encounter some mongodb aggregation operations. Here is a summary. Mongo can store complex types, such as arrays, objects and other document structures that MySQL is not good at handling, and the operation of aggregation is much more complex than mysql.

Note: This paper is based on mongodb V3.6

catalogue

  • Comparison of Mongo and MySQL aggregation
  • About aggregate
  • Aggregate syntax
  • Introduction to the commonly used pipeline stage of aggregate (the core of this article)
  • Node operation Mongo aggregate query (the core of this paper)

Comparison of Mongo and MySQL aggregation

In order to make it easy to understand, we first make the following analogy between the common Mongo aggregation operation and the MySQL query

SQL operation / function Mongodb aggregation operation
where $match
group by $group
having $match
select $project
order by $sort
limit  $limit
sum() $sum
count() $sum
join

$lookup  

(new in v3.2)

 

 

 

 

 

 

 

 

 

 

 

Here are some common Mongo aggregation examples compared with MySQL. Suppose there is a database record (Table Name: orders) as an example:

{
  cust_id: "abc123",
  ord_date: ISODate("2012-11-02T17:04:11.102Z"),
  status: 'A',
  price: 50,
  items: [ { sku: "xxx", qty: 25, price: 1 },
           { sku: "yyy", qty: 25, price: 1 } ]
}

1. Count all records in orders table

db.orders.aggregate( [
   {
     $group: {
        _id: null,
        count: { $sum: 1 }
     }
   }
] )

Similar to MySQL:
SELECT COUNT(*) AS count   FROM orders

2. Sum all prices in the orders table

db.orders.aggregate( [
   {
     $group: {
        _id: null,
        total: { $sum: "$price" }
     }
   }
] )

Similar to MySQL;
SELECT SUM(price) AS total  FROM orders

3. For each unique cust_ ID to calculate the sum of prices

db.orders.aggregate( [
   {
     $group: {
        _id: "$cust_id",
        total: { $sum: "$price" }
     }
   }
] )

Similar to MySQL:
SELECT cust_id,
       SUM(price) AS total
FROM orders
GROUP BY cust_id

4. For each unique pair cust_ ID and ord_ Group date and calculate the total price, excluding the time part of the date

db.orders.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$cust_id",
           ord_date: {
               month: { $month: "$ord_date" },
               day: { $dayOfMonth: "$ord_date" },
               year: { $year: "$ord_date"}
           }
        },
        total: { $sum: "$price" }
     }
   }
] )

Similar to MySQL:
SELECT cust_id,
       ord_date,
       SUM(price) AS total
FROM orders
GROUP BY cust_id,
         ord_date

5. For cust with multiple records_ ID, return cust_ ID and corresponding quantity

db.orders.aggregate( [
   {
     $group: {
        _id: "$cust_id",
        count: { $sum: 1 }
     }
   },
   { $match: { count: { $gt: 1 } } }
] )

Similar to MySQL:
SELECT cust_id,
       count(*)
FROM orders
GROUP BY cust_id
HAVING count(*) > 1

6. For each unique cust_ ID and ord_ Group date, calculate the total price, return only the records whose total price is greater than 250, and exclude the time part of the date

db.orders.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$cust_id",
           ord_date: {
               month: { $month: "$ord_date" },
               day: { $dayOfMonth: "$ord_date" },
               year: { $year: "$ord_date"}
           }
        },
        total: { $sum: "$price" }
     }
   },
   { $match: { total: { $gt: 250 } } }
] )

Similar to MySQL:
SELECT cust_id,
       ord_date,
       SUM(price) AS total
FROM orders
GROUP BY cust_id,
         ord_date
HAVING total > 250

7. For each unique cust_ If Id and status = a, the total price is calculated

db.orders.aggregate( [
   { $match: { status: 'A' } },
   {
     $group: {
        _id: "$cust_id",
        total: { $sum: "$price" }
     }
   }
] )

Similar to MySQL:
SELECT cust_id,
       SUM(price) as total
FROM orders
WHERE status = 'A'
GROUP BY cust_id

8. For each unique cust_ If Id and status = a, the total price is calculated and only records whose total price is greater than 250 are returned

db.orders.aggregate( [
   { $match: { status: 'A' } },
   {
     $group: {
        _id: "$cust_id",
        total: { $sum: "$price" }
     }
   },
   { $match: { total: { $gt: 250 } } }
] )

Similar to MySQL:
SELECT cust_id,
       SUM(price) as total
FROM orders
WHERE status = 'A'
GROUP BY cust_id
HAVING total > 250

9. For each unique cust_ ID, the corresponding order item associated with orders_ The quantity field of lineitem

db.orders.aggregate( [
   { $unwind: "$items" },
   {
     $group: {
        _id: "$cust_id",
        qty: { $sum: "$items.qty" }
     }
   }
] )

Similar to MySQL:
SELECT cust_id,
       SUM(li.qty) as qty
FROM orders o,
     order_lineitem li
WHERE li.order_id = o.id
GROUP BY cust_id

10. Statistics of different custs_ ID and ord_ The number of date groups, excluding the time part of the date

db.orders.aggregate( [
   {
     $group: {
        _id: {
           cust_id: "$cust_id",
           ord_date: {
               month: { $month: "$ord_date" },
               day: { $dayOfMonth: "$ord_date" },
               year: { $year: "$ord_date"}
           }
        }
     }
   },
   {
     $group: {
        _id: null,
        count: { $sum: 1 }
     }
   }
] )

Similar to MySQL:
SELECT COUNT(*)
FROM (SELECT cust_id, ord_date
      FROM orders
      GROUP BY cust_id, ord_date)
      as DerivedTable

About aggregate

DB. Collection. Aggregate () is an aggregation pipeline based on data processing. Each document passes through a pipeline composed of multiple stages. It can group and filter the pipelines of each stage, and then output the corresponding results after a series of processing.

From this diagram, you can understand the process of aggregate processing.

1. DB. Collection. Aggregate() can create a pipeline with multiple components to process a series of documents. These components include: match of filtering operation, project of mapping operation, group of grouping operation, sort of sorting operation, limit of limiting operation and skip of skipping operation.
2. DB. Collection. Aggregate () uses mongodb’s built-in native operation, which has very high aggregation efficiency. It supports functions similar to SQL group by operation, and does not need users to write custom JavaScript routines.
3. The pipeline is limited to 100MB of memory per phase. If a node pipeline exceeds this limit, mongodb will generate an error. In order to be able to process large data sets, you can set allowdiskuse to true to write data to temporary files in the aggregation pipeline node. This can solve the 100MB memory limit.
4. DB. Collection. Aggregate() can act on the partitioned set, but the result cannot be input in the partitioned set. MapReduce can act on the partitioned set, and the result can also be input in the partitioned set.
5. The DB. Collection. Aggregate () method can return a cursor and put the data in memory for direct operation. Like Mongo shell, pointer operation.
6. The output result of DB. Collection. Aggregate() can only be saved in one document, and the size of bson document is limited to 16m. You can solve this problem by returning a pointer. In version 2.6, the DB. Collect. Aggregate () method returns a pointer, which can return the size of any result set.

Aggregate syntax

Basic format:

db.collection.aggregate(pipeline, options)

Parameter Description:

parameter type describe
pipeline array

A series of data aggregation operations or phases. SeeAggregate pipeline operators
Changed in version 2.6: the pipeline phase can still be accepted as a separate parameter instead of as an element in an array; However, if you do not specify a pipe as an array, you cannot specify the options parameter

options document  

Optional. Other options that aggregate () passes to the aggregate command.
What’s new in version 2.6: available only when you specify a pipe as an array.

be careful:

Using dB. Collection. Aggregate() to query directly will prompt an error, but passing an empty array, such as dB. Collection. Aggregate ([]), will not report an error, and will return all documents like find.

 

There are many stages in the pipeline, but here I only record a few that I often use, and I will add if I use them later. See theOfficial website

Next, we will introduce these common stages

$count , $group,  $match, $project,  $unwind, $limit, $skip,  $sort, $sortByCount,  $lookup, $out, $addFields

Introduction to the common pipeline stage of aggregate

$count

interpretation:

Returns the count of documents that contain input to the stage, which is understood as the count of documents that match the find() query of the table or view.

The DB. Collection. Count() method does not perform the find() operation, but counts and returns the number of results that match the query.

Grammar:

{ $count:  }

The $count phase is equivalent to the following sequence of $group + $project:

db.collection.aggregate( [
   { $group: { _ ID: null, MYCOUNT: {$sum: 1}}}, where MYCOUNT is customized, which is equivalent to select count (*) as MYCOUNT in MySQL
   { $project: { _ ID: 0}} # return not displayed_ ID field
] )

give an example:

Sample data:

{ "_id" : 1, "subject" : "History", "score" : 88 }
{ "_id" : 2, "subject" : "History", "score" : 92 }
{ "_id" : 3, "subject" : "History", "score" : 97 }
{ "_id" : 4, "subject" : "History", "score" : 71 }
{ "_id" : 5, "subject" : "History", "score" : 79 }
{ "_id" : 6, "subject" : "History", "score" : 83 }

Implementation:

1) The $match phase excludes documents with score less than or equal to 80 and passes documents with score greater than or equal to 80 to the next phase

2) The $count phase returns the count of the remaining documents in the aggregation pipeline and assigns the value to a file named passing_ The field of scores.

Implementation results:

$group

interpretation:

The documents are grouped according to the specified expression, and the documents of each different grouping are output to the next stage. The output document contains a_ ID field, which contains different groups.

The output document can also contain a calculated field, which is saved by the_ The ID field groups the values of some accumulator expressions$ Group does not output specific documents, but just statistics.

Grammar:

{ $group: { _id: , : {  :  }, ... } }
  • _ The ID field is required; However, you can specify_ The ID value is null to calculate the cumulative value for the entire input document.
  • The remaining calculated fields are optional and use theOperator.
  • _ ID andAn expression can accept any validexpression

Accumulator operator

name describe Analogy SQL
$avg Calculate the mean value avg
$first Return the first document of each group. If it is sorted, it will be sorted. If it is not sorted, it will be the first document in the default stored order. limit 0,1
$last Returns the last document of each group. If there is a sort, it will be sorted. If there is no default stored order, it will be the last document.
$max According to the grouping, get the maximum value of all documents in the collection. max
$min According to the grouping, get the minimum value of the corresponding value of all documents in the collection. min
$push Adds the value of the specified expression to an array.
$addToSet  Adds the value of an expression to a collection (no duplicate values, out of order).
$sum Calculate the sum sum
$stdDevPop Returns the population standard deviation of the input value
$stdDevSamp Returns the sample standard deviation of the input value

The memory limit of the $group phase is 100m. By default, if the stage exceeds this limit, $group will generate an error. However, to allow processing of large datasets, set the allowdiskuse option to true to enable the $group operation to write temporary files.

Friendly remarks:

  • “$addtoset”: expr. If the current array does not contain expr, add it to the array.
  • “$push”: expr, whatever expr is, is added to the array. Returns an array of all values.

Changes have been made in version 2.6: for the $group phase, mongodb introduces a 100m memory limit and the allow disk use option to handle large data set operations.

give an example:

Sample data:

{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-03-01T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-03-01T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-03-15T09:00:00Z") }
{ "_id" : 4, "item" : "xyz", "price" : 5, "quantity" : 20, "date" : ISODate("2014-04-04T11:21:39.736Z") }
{ "_id" : 5, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-04-04T21:23:13.331Z") }

one   The following summary operations use the $group stage to group documents by month, date and year, calculate the total price and average quantity, and calculate the number of documents in each group:

return:

/* 1 */
{
    "_id" : {
        "month" : 4,
        "day" : 4,
        "year" : 2014
    },
    "totalPrice" : 200,
    "averageQuantity" : 15.0,
    "count" : 2.0
}

/* 2 */
{
    "_id" : {
        "month" : 3,
        "day" : 15,
        "year" : 2014
    },
    "totalPrice" : 50,
    "averageQuantity" : 10.0,
    "count" : 1.0
}

/* 3 */
{
    "_id" : {
        "month" : 3,
        "day" : 1,
        "year" : 2014
    },
    "totalPrice" : 40,
    "averageQuantity" : 1.5,
    "count" : 2.0
}

2. group null  , The following aggregation operations specify the group_ If Id is null, calculate the total price, average quantity and count of all documents in the collection

3. Query distinct values

The following rollup uses the $group phase to group documents by item to retrieve different item values:

4. Data conversion

1) Group the data in the collection by price and convert it into an item array

The returned data ID value is the field specified in the group. Items can be customized and is a grouped list

2) Next, aggregate the operating utility variable $$root to group documents by item,The generated document cannot exceed the bson document size limit.

return:

/* 1 */
{
    "_id" : "xyz",
    "books" : [ 
        {
            "_id" : 3,
            "item" : "xyz",
            "price" : 5,
            "quantity" : 10,
            "date" : ISODate("2014-03-15T09:00:00.000Z")
        }, 
        {
            "_id" : 4,
            "item" : "xyz",
            "price" : 5,
            "quantity" : 20,
            "date" : ISODate("2014-04-04T11:21:39.736Z")
        }
    ]
}

/* 2 */
{
    "_id" : "jkl",
    "books" : [ 
        {
            "_id" : 2,
            "item" : "jkl",
            "price" : 20,
            "quantity" : 1,
            "date" : ISODate("2014-03-01T09:00:00.000Z")
        }
    ]
}

/* 3 */
{
    "_id" : "abc",
    "books" : [ 
        {
            "_id" : 1,
            "item" : "abc",
            "price" : 10,
            "quantity" : 2,
            "date" : ISODate("2014-03-01T08:00:00.000Z")
        }, 
        {
            "_id" : 5,
            "item" : "abc",
            "price" : 10,
            "quantity" : 10,
            "date" : ISODate("2014-04-04T21:23:13.331Z")
        }
    ]
}

$match

interpretation:

Filter the documents, and only the documents that meet the specified conditions are passed to the next pipeline stage.
$match accepts a document with specified query criteria. The query syntax is the same as the read operation query syntax.

Grammar:

{ $match: {  } }

Pipeline optimization:

$match is used to filter documents, and then aggregate them on a subset of documents. It can use all conventional query operators except geospatial,In practical application, try to put $match in front of the pipeline. This has two advantages: one is that you can quickly filter out unnecessary documents to save timeReduce pipeline workload; Second, if $match is executed before projection and grouping,Queries can use indexes

Restrictions:

  • You cannot use $as part of an aggregate pipeline in a $match query.
  • To use $text in the $match phase, the $match phase must be the first phase of the pipeline.
  • View does not support text search.

give an example:

Sample data:

{ "_id" : ObjectId("512bc95fe835e68f199c8686"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("512bc962e835e68f199c8687"), "author" : "dave", "score" : 85, "views" : 521 }
{ "_id" : ObjectId("55f5a192d4bede9ac365b257"), "author" : "ahn", "score" : 60, "views" : 1000 }
{ "_id" : ObjectId("55f5a192d4bede9ac365b258"), "author" : "li", "score" : 55, "views" : 5000 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b259"), "author" : "annT", "score" : 60, "views" : 50 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b25a"), "author" : "li", "score" : 94, "views" : 999 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b25b"), "author" : "ty", "score" : 95, "views" : 1000 }

1. Use $match to do simple matching query

return:

/* 1 */
{
    "_id" : ObjectId("512bc95fe835e68f199c8686"),
    "author" : "dave",
    "score" : 80,
    "views" : 100
}

/* 2 */
{
    "_id" : ObjectId("512bc962e835e68f199c8687"),
    "author" : "dave",
    "score" : 85,
    "views" : 521
}

2. Use the $match pipeline to select the documents to be processed, and then output the results to the $group pipeline to calculate the document count:

return:

/* 1 */
{
    "_id" : null,
    "count" : 5.0
}

$unwind

interpretation:

Deconstructs the array field from the input document to output the document for each element. In short, it is   You can split an array into separate documents.

Grammar:

{ $unwind:  }

To specify the field path, precede the field name with the $sign and enclose it in quotation marks.

V3.2 + supports the following syntax:

{
  $unwind:
    {
      path: ,
      Includearrayindex:, # optional, the name of a new field is used to store the array index of the element. The name cannot start with $.
Preservenullandemptyarrays: # optional, default: false, if true, if the path is empty, missing or empty array, then $unwind outputs the document
} }

If you specify a path for a field that does not exist in the input document, or if the field is an empty array, $unwind ignores the input document by default and does not output the document of the input document.

New feature in version 3.2: to output documents with missing array fields, null or empty arrays, use the option preserve null and empty arrays.

give an example:

1. Sample data 1:

{ "_id" : 1, "item" : "ABC1", sizes: [ "S", "M", "L"] }

The following aggregation uses $unwind to output a document for each element in the sizes array:

db.getCollection('test').aggregate(
 [ { $unwind : "$sizes" } ]
)

return:

{ “_id” : 1, “item” : “ABC1”, “sizes” : “S” }
{ “_id” : 1, “item” : “ABC1”, “sizes” : “M” }
{ “_id” : 1, “item” : “ABC1”, “sizes” : “L” }

Each document is the same as the input document, except that the values of the sizes field are the values of the original sizes array.

2. Take the following example data:

{ "_id" : 1, "item" : "ABC", "sizes": [ "S", "M", "L"] }
{ "_id" : 2, "item" : "EFG", "sizes" : [ ] }
{ "_id" : 3, "item" : "IJK", "sizes": "M" }
{ "_id" : 4, "item" : "LMN" }
{ "_id" : 5, "item" : "XYZ", "sizes" : null }

1) The following $unwind operation uses the include array index option to output the array index of an array element.

db.getCollection('test').aggregate( [ { $unwind: { path: "$sizes", includeArrayIndex: "arrayIndex" } } ] )

return:

{ “_id” : 1, “item” : “ABC”, “sizes” : “S”, “arrayIndex” : NumberLong(0) }
{ “_id” : 1, “item” : “ABC”, “sizes” : “M”, “arrayIndex” : NumberLong(1) }
{ “_id” : 1, “item” : “ABC”, “sizes” : “L”, “arrayIndex” : NumberLong(2) }
{ “_id” : 3, “item” : “IJK”, “sizes” : “M”, “arrayIndex” : null }

2) The following $unwind operation uses the preserve null and empty arrays option to include documents in the output that are missing the size field, null, or empty array.

db.inventory.aggregate( [
   { $unwind: { path: "$sizes", preserveNullAndEmptyArrays: true } }
] )

return:
{ "_id" : 1, "item" : "ABC", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC", "sizes" : "L" }
{ "_id" : 2, "item" : "EFG" }
{ "_id" : 3, "item" : "IJK", "sizes" : "M" }
{ "_id" : 4, "item" : "LMN" }
{ "_id" : 5, "item" : "XYZ", "sizes" : null }

$project

interpretation:

$project can select the desired and unwanted fields from the document(The specified field can be an existing field from an input document or a new calculated field

)It can also perform some complex operations through pipeline expressions, such as mathematical operations, date operations, string operations, and logical operations.

Grammar:

{ $project: {  } }

The $project pipeline character is used to select fields (specify fields, add fields, and do not display fields)_ ID: 0, exclude fields, etc.), rename fields, derive fields.

Specifications come in the following forms:

<1 or true>     Whether to include the field, field:1/0 , indicating select / not select field

_id: <0 or false>         Specify_ ID field

  Add a new field or reset the value of an existing field. Change in version 3.6: mongodb 3.6 adds the variable remove. If the expression evaluates to $$remove, the field is excluded from the output.

:<0 or false>     V3.4 new function, specify exclusion field

  • By default_ The ID field is included in the output document. To include any other fields in the input document in the output document, the include in $project must be explicitly specified. If you specify to include a field that does not exist in the document, $project ignores the field inclusion and does not add the field to the document.
  • By default_ The ID field is included in the output document. To exclude from the output document_ ID field, which must be explicitly specified in $project_ The ID field is 0.
  • What’s new in v3.4 – if you specify to exclude one or more fields, all other fields are returned in the output document. If exclusion is specified_ If you exclude a field, you cannot specify the containing field, reset the value of an existing field, or add a new field. This restriction does not apply to conditional exclusion fields using the remove variable.
  • New feature in V3.6 – starting with mongodb 3.6, you can use the variable remove in aggregate expressions to conditionally disable a field.
  • To add a new field or reset the value of an existing field, specify the field name and set its value to an expression.
  • To set the field value directly to a number or Boolean text, rather than to an expression that resolves to text, use the $literal operator. Otherwise, the $project treats a number or Boolean text as a flag to include or exclude the field.
  • You can effectively rename a field by specifying a new field and setting its value to the field path of an existing field.
  • Starting from mongodb 3.2, the $project phase supports creating new array fields directly using square brackets. If the array specification contains a field that does not exist in the document, the operation replaces the null value with the value of the field.
  • Changed in version 3.4-If $project is an empty document, mongodb 3.4 and later will generate an error.
  • You can use point symbols when projecting or adding / resetting fields embedded in a document. For example:
"contact.address.country": <1 or 0 or expression>
or
contact: { address: { country: <1 or 0 or expression> } }

give an example:

Sample data:

{
  "_id" : 1,
  title: "abc123",
  isbn: "0001122223334",
  author: { last: "zzz", first: "aaa" },
  copies: 5,
  lastModified: "2016-07-28"
}

one   The output document of the following $project phase contains only_ ID, title and author fields:

db.getCollection('test').aggregate( [ { $project : { title : 1 , author : 1 } } ] )

return:

{ "_id" : 1, "title" : "abc123", "author" : { "last" : "zzz", "first" : "aaa" } }

2. _ The ID field is included by default. To exclude from the output document of the $project phase_ In the project document, enter the ID field_ The ID field is set to 0 to specify the exclusion_ ID field.

db.getCollection('test').aggregate( [ { $project : { _id: 0, title : 1 , author : 1 } } ] )

return:

{ "title" : "abc123", "author" : { "last" : "zzz", "first" : "aaa" } }

3. The following $project phase excludes the LastModified field from the output:

db.getCollection('test').aggregate( [ { $project : { "lastModified": 0 } } ] )

4.Exclude fields from nested documents, In the $project phase, the author.first and LastModified fields are excluded from the output:

db.test.aggregate( [ { $project : { "author.first" : 0, "lastModified" : 0 } } ] )

Alternatively, you can nest exclusion specifications in a document:

db.test.aggregate( [ { $project: { "author": { "first": 0}, "lastModified" : 0 } } ] )

return:

{
   "_id" : 1,
   "title" : "abc123",
   "isbn" : "0001122223334",
   "author" : {
      "last" : "zzz"
   },
   "copies" : 5,
}

New features in version 3.6.Starting with mongodb 3.6, you can use the variable remove in aggregate expressions to conditionally disable a field.

Sample data:

{
  "_id" : 1,
  title: "abc123",
  isbn: "0001122223334",
  author: { last: "zzz", first: "aaa" },
  copies: 5,
  lastModified: "2016-07-28"
}
{
  "_id" : 2,
  title: "Baked Goods",
  isbn: "9999999999999",
  author: { last: "xyz", first: "abc", middle: "" },
  copies: 2,
  lastModified: "2017-07-21"
}
{
  "_id" : 3,
  title: "Ice Cream Cakes",
  isbn: "8888888888888",
  author: { last: "xyz", first: "abc", middle: "mmm" },
  copies: 5,
  lastModified: "2017-07-22"
}

5. The following $project phase uses the remove variable to exclude the author. Middle field if it is equal to ”:

db.books.aggregate( [
   {
      $project: {
         title: 1,
         "author.first": 1,
         "author.last" : 1,
         "author.middle": {
            $cond: {
               if: { $eq: [ "", "$author.middle" ] },
               then: "$$REMOVE",
               else: "$author.middle"
            }
         }
      }
   }
] )

return:

{ "_id" : 1, "title" : "abc123", "author" : { "last" : "zzz", "first" : "aaa" } }
{ "_id" : 2, "title" : "Baked Goods", "author" : { "last" : "xyz", "first" : "abc" } }
{ "_id" : 3, "title" : "Ice Cream Cakes", "author" : { "last" : "xyz", "first" : "abc", "middle" : "mmm" } }

Contains the specified fields from the embedded document(the results only return fields that contain nested documents, and of course, fields that contain nested documents_ id)

Sample document:

{ _id: 1, user: "1234", stop: { title: "book1", author: "xyz", page: 32 } }
{ _id: 2, user: "7890", stop: [ { title: "book2", author: "abc", page: 5 }, { title: "book3", author: "ijk", page: 100 } ] }

Only the title field in the stop field is returned:

db.bookmarks.aggregate( [ { $project: { "stop.title": 1 } } ] )

or
db.bookmarks.aggregate( [ { $project: { stop: { title: 1 } } } ] )

return:

{ "_id" : 1, "stop" : { "title" : "book1" } }
{ "_id" : 2, "stop" : [ { "title" : "book2" }, { "title" : "book3" } ] }

Include calculated fields

Sample data:

{
  "_id" : 1,
  title: "abc123",
  isbn: "0001122223334",
  author: { last: "zzz", first: "aaa" },
  copies: 5
}

ISBN, LastName and copiesold are added to the return field

db.books.aggregate(
   [
      {
         $project: {
            title: 1,
            isbn: {
               prefix: { $substr: [ "$isbn", 0, 3 ] },
               group: { $substr: [ "$isbn", 3, 2 ] },
               publisher: { $substr: [ "$isbn", 5, 4 ] },
               title: { $substr: [ "$isbn", 9, 3 ] },
               checkDigit: { $substr: [ "$isbn", 12, 1] }
            },
            lastName: "$author.last",
            copiesSold: "$copies"
         }
      }
   ]
)

Return result of the above execution:

{
   "_id" : 1,
   "title" : "abc123",
   "isbn" : {
      "prefix" : "000",
      "group" : "11",
      "publisher" : "2222",
      "title" : "333",
      "checkDigit" : "4"
   },
   "lastName" : "zzz",
   "copiesSold" : 5
}

Project a new array field

Sample data:

{ "_id" : ObjectId("55ad167f320c6be244eb3b95"), "x" : 1, "y" : 1 }

The following aggregation operation returns the new array field myArray:

db.collection.aggregate( [ { $project: { myArray: [ "$x", "$y" ] } } ] )

return:

{ "_id" : ObjectId("55ad167f320c6be244eb3b95"), "myArray" : [ 1, 1 ] }

If the returned array contains fields that do not exist, null will be returned:

db.collection.aggregate( [ { $project: { myArray: [ "$x", "$y", "$someField" ] } } ] )

return:

{ "_id" : ObjectId("55ad167f320c6be244eb3b95"), "myArray" : [ 1, 1, null ] }

$limit

Limit the number of documents passed to the next stage in the pipeline

Grammar:

{ $limit:  }

Example:

db.article.aggregate(
    { $limit : 5 }
);

This operation returns only the first five documents passed to it by the pipeline$ Limit has no effect on the content of the document it passes.

be careful:

When $sort appears immediately before the $limit in the pipeline, the $sort operation will only maintain the first n results in the process, where n is the specified limit, while mongodb only needs to store n items in memory. This optimization is still applicable when allowdiskuse is true and n items exceed the aggregate memory limit.

$skip

Skip the specified number of documents entering the stage and pass the remaining documents to the next stage in the pipeline

Grammar:

{ $skip:  }

Example:

db.article.aggregate(
    { $skip : 5 }
);

This action skips the first five documents passed to it by the pipeline$ Skip has no effect on the content of the document passed along the pipeline.

$sort

Sort all input documents and return them to the pipeline in sort order.

Grammar:

{ $sort: { : , :  ... } }

$sort specifies the fields to sort and the documents in the corresponding sort order.Can have one of the following values:

  • 1 specifies the ascending order.
  • -1 specifies descending order.
  • {$meta: “textscore”} sorts the calculated textscore metadata in descending order.

Example:

To sort fields, set the sort order to 1 or – 1 to specify ascending or descending sort, respectively, as shown in the following example:

db.users.aggregate(
   [
     { $sort : { age : -1, posts: 1 } }
   ]
)

When comparing values of different bson types, mongodb uses the following comparison order, from lowest to highest:

 1 MinKey (internal type)
 2 Null
 3 Numbers (ints, longs, doubles, decimals)
 4 Symbol, String
 5 Object
 6 Array
 7 BinData
 8 ObjectId
 9 Boolean
10 Date
11 Timestamp
12 Regular Expression
13 MaxKey (internal type)

$sortByCount

New in v3.4.Groups incoming documents according to the value of the specified expression, and then calculates the number of documents in each different group.Each output document contains two fields: fields with different grouping values_ The ID field and the count field containing the number of documents belonging to the group or category are arranged in descending order.

Grammar:

{ $sortByCount:   }

 

 

reference:

https://www.jianshu.com/p/e60d5cfbeb35

https://www.cnblogs.com/ljhdo/p/5019837.html

 

Recommended Today

What is “hybrid cloud”?

In this paper, we define the concept of “hybrid cloud”, explain four different cloud deployment models of hybrid cloud, and deeply analyze the industrial trend of hybrid cloud through a series of data and charts. 01 introduction Hybrid cloud is a computing environment that integrates multiple platforms and data centers. Generally speaking, hybrid cloud is […]