Analysis of map reduce 1.0


A brief analysis of the first generation map reduce principle.

No nonsense, above.

Analysis of map reduce 1.0

The figure above shows the schematic diagram of the first generation map reduce. Map stage is on the left and reduce stage is on the right.

Map and reduce are each one process.

First lookLeft map stage

Map data exists in HDFS. After the input format, it is read into the buffer, and the data in the buffer is written to the disk continuously using the overflow mechanism. When all data is written to disk, a merge will be performed, and sorting will be performed during the merge. Therefore, the data after merge is ordered.

The following details the map stages:

  • File: the original data files are stored in HDFS. Each file is cut into several blocks of a certain size (64M by default) (there are three backups by default, which are stored on multiple nodes respectively).
  • Inputformat: it is one of Mr’s basic frameworks. There are two main functions: data segmentation split and record reader.

    • Data splits: it is the actual storage of data, that is, physical storage. With it, you can solve the cross block problem (belonging to the split corresponding to the previous block). In fact, each split may contain the data at the beginning of the next block (that is, when a complete file is larger than 64M, it will be divided into multiple blocks).
      It mainly does two things: ① split; ② record split (including the beginning of the next block)
    • Record reader (RR): it is logical storage, that is, mapping table. Every time a record is read, call the map function (RR only reads the record, and the file’s cross block record data stores the address data in the data split record).
  • Map: This is a map function, not the same as the map process. If the data read from RR is “I love China”, after the map function, the result will be {“I”, 1}, {“love”, 1}, {“China”, 1}. In other words, the read data will be segmented and converted to the format of {“word”, 1}. Then add them to memory.
  • Buffer in memory: memory buffer. Its default size is 100m. When the data accounts for 80% (overflow threshold), the data in memory will be transferred to disk, and at the same time, there will be 20% of the remaining space. The map function will continuously write the data to the 20% of the space.
  • Partition: partition. It determines which reduce will ultimately process the data. For example, if you use the hash method to module the key and generate {partition, key, value}, you can know which reduce will handle the key.
  • Sort: sort. Sort the keys by default.
  • Spill: when the memory buffer reaches the threshold, the over written thread will lock the 80% buffer, write the data to disk, and then release the memory. A data file is generated for each overflow.
  • Combiner: data consolidation. Combine the value values of the same key to reduce the output transmission. It’s actually a reduce function. In general, combiner occurs in the process of over write and merging (before network transmission).
  • Merge: merge the data files of multiple partitions on the local disk (there may be multiple merges). Default merge sort.

There are so many map parts. We often talk about the shuffle phase, including: partition, sort, spill, merge, combiner, copy, memory, disk… Often, the part to optimize is the shuffle phase.

I want to see othersReduce stage on the right

  • Merge: the data of these merges comes from multiple map processes, and the data that is not on the local machine will be transmitted through the network. In the same way, merge sort will be used. Data with the same partition of multiple map tasks will be processed by the same reduce. It will be merged by multiple times.
  • Reduce: final summary. A reduce generates a file.

The reduce phase is relatively simple.

Here are a few small points:

  • The process of map reduce is a synchronization process. Only when the map is completed, can reduce be executed.
  • The number of partitions is determined by the specified number of reduce. (it can be a multiple of the number of reduce)

In map reduce 1.0, there are two important processes:

  • Jobtracker: main process.

    • It accepts customer jobs, schedules tasks, provides node monitoring status, task progress management, etc.
    • A MapReduce cluster has a job tracker.
    • Tasktracker uses periodic heartbeat (default 3S) to inform jobtracker of its own health status. The heartbeat includes the number of available map and reduce tasks, the number of occupied tasks, and the details of running tasks.
    • Jobtracker uses a thread pool to handle both heartbeat and client requests.
  • Tasktracker: task process.

    • Task assigned by jobtracker, periodic heartbeat report (health status, inquiry task)
    • There is always only one tasktracker per work node.
    • Jobtracker has been waiting for jobclient to submit the job.

MapReduce job scheduling adopts FIFO mode by default:

  • Priority: very high, high, normal, low, very low