Mongodb: aggregate and aggregatecursor

Time:2021-2-23

Environmental Science
mongos 3.0.14

aggregate

Using aggregate can realize more complex data aggregation operations, such as count, distinct count, group having, etc.

The result returned by aggregate is an array. It should be noted that the data size cannot exceed 16m.

For example:

$pipeline = [
            ['$match' => $tmpCondition],
            ['$group' => [
                '_id' => ['user_id'=>'$user_id']
            ]],
            ['$group' => [
                '_id' => '_id.user_id',
                'number' => ['$sum'=>1]
            ]]
        ];
$options = [
        'allowDiskUse'=>true,
        'cursor'=>['batchSize'=>1]
];
        $data = MongoSvc::get('user')->user_info->aggregate($pipeline,$options);

aggregateCursor

For the aggregation of a large number of returned results, the aggregate cursor can be used to return cursors to avoid data size overrun.

The return result of aggregatecursor is cursor, which can retrieve data circularly.

For example:

$pipeline = [
            ['$match' => $matchArr],
            ['$project' => ['id'=>1,'_id'=>0]],
            ['$group' => [
                '_id' => '$id',
                'count' => ['$sum' => 1]
            ]],
            ['$match' => [
                'count' => ['$gt' => 1]
            ]]
        ];
        //Here it is changed to aggregate cursor, which is obtained by cursor loop
        $data = MongoSvc::get('user')->user_info->aggregateCursor($pipeline);

Pipeline parameters

  • $match
    Condition matching.
  • $addFields
    Add a new field.
  • $count
    The total number of documents in this stage.
  • $group
    grouping.
  • $limit
    Limit the quantity.
  • $skip
    Jump.
  • $sort
    Sort.
  • $out
    Output the result to the collection.
  • $project
    Filter fields.

https://docs.mongodb.com/manu…

Options parameter

  • explain boolean
    Processing information.
  • allowDiskUse boolean
    True to write temporary data to disk.
  • cursor
    cursor: { batchSize: <int> }
    Sets an initial size for the return collection.
  • hint string or document
    Force the index to be specified.

https://docs.mongodb.com/manu…

Query examples

Count value of a field (such as’ sum ‘) in the summary statistics document:

$pipeline = [
            ['$match' => $tmpCondition],
            ['$group' => [
                '_id' => ['sum'],
                'sum_value' => ['$sum' => '$money']
            ]]
        ];

Data after de duplication of a column:

$pipeline = [
            ['$match' => $tmpCondition],
            ['$group' => [
                '_id' => ['user_id' => '$user_id']
            ]]
        ];

Count a column (e.g.’user ‘_ Count value after de duplication:

$pipeline = [
            ['$match' => $tmpCondition],
            ['$group' => [
                '_id' => ['user_id'=>'$user_id']
            ]],
            ['$group' => [
                '_id' => '_id.user_id',
                'number' => ['$sum'=>1]
            ]]
        ];
        
$pipeline = [
            ['$match' => $tmpCondition],
            ['$group' => [
                '_id'        => ['qid' => '$qid'],
                'max_number' => ['$max' => '$days'] 
            ]],
            ['$group' => [
                '_id' => ['number' => '$max_number'],
                'total' => ['$sum' => 1]
            ]]
        ];

After statistical grouping, the total value of a column in each group is as follows:

$pipeline = [
            ['$match' => $tmpCondition],
            ['$group' => [
                '_id' => ['type' => '$type'],
                'sum_value' => ['$sum' => '$number']
            ]]
        ];