Full analysis of blockmanager in Spark’s distributed storage system

Time:2021-1-13

Abstract:Blockmanager is a very important component in spark. In the process of spark running, blockmanager is everywhere. Only by understanding the principle and mechanism of blockmanager, can you have a deeper understanding of spark.

What is blockmanager?

  • What is the role of blockmanager? I understand that it is responsible for the storage of RDD and how to save it for subsequent tasks.
    The internal module diagram is as follows:

Full analysis of blockmanager in Spark's distributed storage system

  • In the figure, we can see that there are memory store and disk store, which indicates that there are two ways to store blocks: memory and disk. After storage, they are managed through this store.
  • The unit of storage is block, so there will be an array for mapping
  • There is a reference interface responsible for communicating with the driver’s blockmanagermaster
  • There is also a shufflclient, which is responsible for “backup” and “download”, that is, the executors will transfer the block through shufflclient.

The relationship between blockmanager, driver and executor

The relationship is as follows:

Full analysis of blockmanager in Spark's distributed storage system

You can see from it

  1. The blockmanagermaster is generated on the driver side
  2. The blockmanager is generated in the executor and is responsible for registering with BMM.
  3. The registration message in spark is sent through actor system

Process of storing block into blockmanager

Full analysis of blockmanager in Spark's distributed storage system

For example, there are two special places:

  1. When you try to put, you will first check whether there is a cache for the blockid. If there is, you can retrieve it directly. Otherwise, you will recreate the blockinfo
  2. When storing, it will first judge whether the memory is sufficient, and write it to the memorystore if it is sufficient. If it is not enough, it will release it first and then try to put it in.

Delete blocks from blockmanager

Full analysis of blockmanager in Spark's distributed storage system

The deletion operation is nothing special. It mainly determines the storage level of the block and selects the block from different stores.

(?) shufflclient download block operation

Full analysis of blockmanager in Spark's distributed storage system

Bmmac is the abbreviation of blockmanager master actor, which I wrote in the beginning

  • Note: when the block to be fetched comes from several blockmanagers, it should be scrambled to avoid several BMS downloading data from one BM at the same time!

The backup operation of shuffleclinet

Full analysis of blockmanager in Spark's distributed storage system

  • Why does BM backup his block? The author doesn’t explain in the book. My understanding is to prevent the node from crashing or losing, which will cause the intermediate task to be unable to continue?
  • Because other blockmanagers may be limited in the number of blocks they can receive, multiple blocks may be involved in the backup. Each time we take one from bmmasterrandomTo avoid backing up to the same block manager.

Relationship between blockmanager, executor and driver:

Full analysis of blockmanager in Spark's distributed storage system

You can see from it

  1. The blockmanagermaster is generated on the driver side
  2. The blockmanager is generated in the executor and is responsible for registering with BMM.
  3. The registration message in spark is sent through actor system

Click follow to learn about Huawei’s new cloud technology for the first time~

Recommended Today

asp.net Application of regular expression

1. Balanced group / recursive matching (?’ Group ‘), which is called the corresponding content of group, and counts it on the stack;(?’- Group ‘), and count the corresponding content named group out of the stack(?!) Zero width negative look ahead assertion. Since there is no suffix expression, attempts to match always failRegular example:,{0,1}”5″:\[[^\[\]]*(((?’Open’\[)[^\[\]]*)+((?’-Open’\])[^\[\]]*)+)*(?(Open)(?!))\],{0,1} Test […]