[Mr. Zhao Qiang] used MapReduce to calculate aggregation in mongodb

Time:2020-11-19

[Mr. Zhao Qiang] used MapReduce to calculate aggregation in mongodb

MapReduce can calculate very complex aggregation logic and is very flexible. However, MapReduce is very slow and should not be used in real-time data analysis. MapReduce can be executed in parallel on multiple servers. Each server is only responsible for completing part of wordload. Finally, it sends wordload to master server for merging, calculates the final result set and returns it to the client.
The basic idea of MapReduce is shown in the following figure:

[Mr. Zhao Qiang] used MapReduce to calculate aggregation in mongodb

In this example, let’s take a summation as an example. First, the map stage is executed to split a large task into several small tasks, each of which runs on different nodes, so as to support distributed computing. This stage is called map (as shown in the blue box). The output of each small task is calculated twice, and the final result is 55. This stage is called reduce (as shown in the red box).

Using MapReduce to calculate aggregation can be divided into three steps: map, shuffle and reduce. Map and reduce need to be defined explicitly, and shuffle is implemented by mongodb.

  • Map: map the operation to each doc to generate key and value
  • Shuffle: group by key, and combine the values with the same key into an array
  • Reduce to a single value array

Let’s take the following test data (employee data) as an example to demonstrate.

db.emp.insert(
[
{_id:7369,ename:'SMITH' ,job:'CLERK'    ,mgr:7902,hiredate:'17-12-80',sal:800,comm:0,deptno:20},
{_id:7499,ename:'ALLEN' ,job:'SALESMAN' ,mgr:7698,hiredate:'20-02-81',sal:1600,comm:300 ,deptno:30},
{_id:7521,ename:'WARD'  ,job:'SALESMAN' ,mgr:7698,hiredate:'22-02-81',sal:1250,comm:500 ,deptno:30},
{_id:7566,ename:'JONES' ,job:'MANAGER'  ,mgr:7839,hiredate:'02-04-81',sal:2975,comm:0,deptno:20},
{_id:7654,ename:'MARTIN',job:'SALESMAN' ,mgr:7698,hiredate:'28-09-81',sal:1250,comm:1400,deptno:30},
{_id:7698,ename:'BLAKE' ,job:'MANAGER'  ,mgr:7839,hiredate:'01-05-81',sal:2850,comm:0,deptno:30},
{_id:7782,ename:'CLARK' ,job:'MANAGER'  ,mgr:7839,hiredate:'09-06-81',sal:2450,comm:0,deptno:10},
{_id:7788,ename:'SCOTT' ,job:'ANALYST'  ,mgr:7566,hiredate:'19-04-87',sal:3000,comm:0,deptno:20},
{_id:7839,ename:'KING'  ,job:'PRESIDENT',mgr:0,hiredate:'17-11-81',sal:5000,comm:0,deptno:10},
{_id:7844,ename:'TURNER',job:'SALESMAN' ,mgr:7698,hiredate:'08-09-81',sal:1500,comm:0,deptno:30},
{_id:7876,ename:'ADAMS' ,job:'CLERK'    ,mgr:7788,hiredate:'23-05-87',sal:1100,comm:0,deptno:20},
{_id:7900,ename:'JAMES' ,job:'CLERK'    ,mgr:7698,hiredate:'03-12-81',sal:950,comm:0,deptno:30},
{_id:7902,ename:'FORD'  ,job:'ANALYST'  ,mgr:7566,hiredate:'03-12-81',sal:3000,comm:0,deptno:20},
{_id:7934,ename:'MILLER',job:'CLERK'    ,mgr:7782,hiredate:'23-01-82',sal:1300,comm:0,deptno:10}
]
);

(case 1) calculate the number of people in each position in the employee table

var map1=function(){emit(this.job,1)}
var reduce1=function(job,count){return Array.sum(count)}
db.emp.mapReduce(map1,reduce1,{out:"mrdemo1"})

(case 2) calculate the total salary of each department in the employee table

var map2=function(){emit(this.deptno,this.sal)}
var reduce2=function(deptno,sal){return Array.sum(sal)}
db.emp.mapReduce(map2,reduce2,{out:"mrdemo2"})

(case 3) troubleshooting the map function

Define your own emit function:
var emit = function(key, value) {
print("emit");
print("key: " + key + "  value: " + tojson(value));
}

Test a piece of data:
emp7839=db.emp.findOne({_id:7839})
map2.apply(emp7839)
Output the following results:
emit
key: 10  value: 5000

Test multiple data:
var myCursor=db.emp.find()
while (myCursor.hasNext()) {
    var doc = myCursor.next();
    print ("document _id= " + tojson(doc._id));
    map2.apply(doc);
    print();
}

(case 4) troubleshooting the reduce function

A simple test case
var myTestValues = [ 5, 5, 10 ];
var reduce1=function(key,values){return Array.sum(values)}
reduce1("mykey",myTestValues)

Test: the value of reduce contains multiple values
Test data: salary, bonus:
var myTestObjects = [
                      { sal: 1000, comm: 5 },
                      { sal: 2000, comm: 10 },
                      { sal: 3000, comm: 15 }
                    ];
Develop the reduce method:
var reduce2=function(key,values) {
   reducedValue = { sal: 0, comm: 0 };
   for(var i=0;i<values.length;i++) {
     reducedValue.sal += values[i].sal;
     reducedValue.comm += values[i].comm;
   }  
   return reducedValue;
}

Test:
reduce2("aa",myTestObjects)

[Mr. Zhao Qiang] used MapReduce to calculate aggregation in mongodb

Recommended Today

JS function

1. Ordinary function Grammar: Function function name (){ Statement block } 2. Functions with parameters Grammar: Function function name (parameter list){ Statement block } 3. Function with return value Grammar: Function function name (parameter list){ Statement block; Return value; } Allow a variable to accept the return value after calling the function Var variable name […]