MongoDB aggregate uses a personal summary


Recently, I have been using mongodb. Sometimes I need to use statistics. I have checked some information on the Internet. The most suitable way to use aggregate is to use aggregate. Here is my experience in using it.

MongoDB aggregation
Aggregate in MongoDB is mainly used to process data (such as statistical average, sum, etc.) and return the calculated data results. It’s a bit like count (*) in SQL statements.
Aggregate () method
Aggregate () is used for aggregation in MongoDB.
The basic grammatical format of aggregate () method is as follows:


The data in the collection is as follows:

  _id: ObjectId(7df78ad8902c)
  title: 'MongoDB Overview', 
  description: 'MongoDB is no sql database',
  by_user: '',
  url: '//',
  tags: ['mongodb', 'database', 'NoSQL'],
  likes: 100
  _id: ObjectId(7df78ad8902d)
  title: 'NoSQL Overview', 
  description: 'No sql database is very fast',
  by_user: '',
  url: '//',
  tags: ['mongodb', 'database', 'NoSQL'],
  likes: 10
  _id: ObjectId(7df78ad8902e)
  title: 'Neo4j Overview', 
  description: 'Neo4j is no sql database',
  by_user: 'Neo4j',
  url: '',
  tags: ['neo4j', 'database', 'NoSQL'],
  likes: 750

Now we calculate the number of articles written by each author through the above set, and use aggregate () to calculate the results as follows:

> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])
  "result" : [
     "_id" : "",
     "num_tutorial" : 2
     "_id" : "Neo4j",
     "num_tutorial" : 1
  "ok" : 1

The above example is similar to the SQL statement: select by_user, count (*) from mycol group by_user
In the example above, we grouped the data through the field by_user and calculated the sum of the same values in the field by_user.
The following table shows some aggregation expressions:

Expression describe Example
$sum Calculate the sum. db.mycol.aggregate([{$group : {_id : “$by_user”, num_tutorial : {$sum : “$likes”}}}])
$avg Calculate average db.mycol.aggregate([{$group : {_id : “$by_user”, num_tutorial : {$avg : “$likes”}}}])
$min Gets the minimum corresponding value of all documents in the collection. db.mycol.aggregate([{$group : {_id : “$by_user”, num_tutorial : {$min : “$likes”}}}])
$max Gets the maximum corresponding value of all documents in the collection. db.mycol.aggregate([{$group : {_id : “$by_user”, num_tutorial : {$max : “$likes”}}}])
$push Insert values into an array in the result document. db.mycol.aggregate([{$group : {_id : “$by_user”, url : {$push: “$url”}}}])
$addToSet Insert a value into an array in the result document, but do not create a copy. db.mycol.aggregate([{$group : {_id : “$by_user”, url : {$addToSet : “$url”}}}])
$first Get the first document data according to the ranking of resource documents. db.mycol.aggregate([{$group : {_id : “$by_user”, first_url : {$first : “$url”}}}])
$last Get the last document data according to the ranking of resource documents db.mycol.aggregate([{$group : {_id : “$by_user”, last_url : {$last : “$url”}}}])

The Concept of Pipeline
Pipelines are commonly used in Unix and Linux to take the output of the current command as a parameter of the next command.
MongoDB’s aggregation pipeline passes MongoDB documents to the next pipeline after one pipeline has been processed. Pipeline operation can be repeated.
Expressions: Processing input documents and outputting them. Expressions are stateless and can only be used to compute documents for the current aggregation pipeline, not for other documents.
Here we introduce some common operations in aggregation frameworks:
$project: Modify the structure of the input document. It can be used to rename, add or delete domains, as well as to create computational results and nested documents.
$match: Used to filter data and output only qualified documents. $match uses MongoDB’s standard query operation.
$limit: Used to limit the number of documents returned by the MongoDB aggregation pipeline.
Skp: Skip a specified number of documents in the aggregation pipeline and return the remaining documents.
Unwind: Split an array type field in a document into multiple fields, each containing a value in the array.
$group: Grouping documents in a collection for statistical results.
Sort: Sort the input documents and output them.
$geoNear: Outputs ordered documents close to a geographic location.

Pipeline operator example

1. $project instance

  { $project : {
    title : 1 ,
    author : 1 ,

In this way, there are only three fields in the result: id, tile and author. By default, the _id field is included. If you want to exclude _id, you can do this:

  { $project : {
    _id : 0 ,
    title : 1 ,
    author : 1

2. $match instance

db.articles.aggregate( [
            { $match : { score : { $gt : 70, $lte : 90 } } },
            { $group: { _id: null, count: { $sum: 1 } } }
            ] );

The $match is used to get records with scores greater than 70, less than or equal to 90, and then send qualified records to the next stage of the $group pipeline operator for processing.

3. $skip example

  { $skip : 5 });

After processing with the $skip pipeline operator, the first five documents are “filtered out”.

Others have written I just described more, you can search for more than the same N, I write my summary.

Basic knowledge

Please find more for yourself. Here are the key documents.

Operator introduction:

$project: Contains, excludes, renames, and displays fields
$match: Query with the same parameters as find ()
$limit: Limit the number of results
Skp: Number of results ignored
Sort: Sort the results by a given field
$group: Combine results according to a given expression
Unwind: Divide the embedded array into its own top-level file

Document: MongoDB official aggregate description.

Relevant uses:


Array is any one or more operators.
The use of group and match, using SQL server, group use is well understood, according to the designated column grouping statistics, you can count the number of grouping, but also the sum or average value of grouping.
The matching before group is to query the source data, and the matching after group is to filter the data after group.

Similarly, sort, skip and limit are the same principles.


The following are examples:
Application 1: Statistics the number and total number of names;


Application 2: Statistical status = 1 name number;


Application 3: Statistics the number of names, and the number is less than 2;


Application 4: Statistical stauts = 1 name number, and the number is 1;


Multi-column group, based on name and status


The $project operator is simple.


As a result, only table data in the three fields of _id, name and status corresponds to the SQL expression select_id, name, status from collection.
This operator can split an array of documents into multiple documents, which is useful under special conditions. I haven’t done much research yet.
The above can basically achieve most of the statistics, the pre-group conditions, post-group conditions, is the focus.