Micro blog machine learning platform uses Flink to realize multi stream join to generate the samples needed by online machine learning. The data in the time window will be cached in the state, and the delay of state access usually determines the performance of the job. Open source Flink’s state storage mainly includes rocksdb and heap. At last year’s Flink forward conference, we learned that alicloud’s VVP product has developed a higher performance state storage plug-in Gemini, which has been tested and tried.
In this article, we will test rocksdb, heap and Gemini in the same scenario, and compare their resource consumption. The version of Flink kernel tested is 1.10.0.
We use the real sample splicing business as the test scenario. By unifying the data of multiple streams and then aggregating the specified key, we get the corresponding fields from each stream in the aggregation function, and recombine the required fields into a new object to store in the value state. Here, a timer is defined for each new object, and the timer function is used to replace the timewindow. At the end of the window, the data is sent to the downstream operator. The main reason for using timer function is that timer is more flexible, more convenient for users to customize, and performs better in the practicability and scalability of the platform.
MemoryStateBackend vs. RocksDBStateBackend
First of all, it should be noted that memorystatebackend is not recommended to be used online. Here, we mainly test to quantify the resource consumption of using heap to store state.
In the test, we configured checkpoint as follows:
Checkpoin tInterval:10 minute CheckpointingMode: EXACTLY_ONCE Checkpoi ntTimeout:3 minute
At the same time, the following configuration is added to rocksdb:
setCompressionType：LZ4_COMPRESSION setTargetFileSizeBase：128 * 1024 * 1024 setMinWriteBufferNumberToMerge：3 setMaxWriteBufferNumber：4 setWriteBufferSize：1G setBlockCacheSize：10G setBlockSize：4 * 1024 setFilter：BloomFilter(10, false)
The test shows that when the same job processes the same amount of data, the throughput of the job using memorystatebackend is similar to rocksdb (input QPS is 300000, output QPS is 20000 after aggregation), but the required memory is not enough（ taskmanager.heap.mb ）It is 8 times of rocksdb, and the corresponding machine resources are 2 times of rocksdb.
So we draw the following conclusions
- Using memorystatebackend needs to increase a lot of heap space to store the state data (samples) in the window. Compared with putting the data on disk, the advantage is that the processing performance is very good, but the disadvantage is very obvious: because the storage efficiency of Java objects in memory is not high, GB Level memory can only store 100 megabytes of real physical data, so there will be a lot of memory overhead, and the downtime of JVM is relatively high, which affects the overall stability of the job. In addition, there will be oom risk when encountering hot events.
- Rocksdb requires less heap space, and increases the native area for read cache. Combined with rocksdb’s efficient disk read / write strategy, rocksdb still has good performance.
GeminiStateBackend vs. RocksDBStateBackend
Gemini state backend can be specified in verica platform products as follows:
At the same time, we configure Gemini as follows:
//Specifies the local directory where Gemini is stored kubernetes.taskmanager.replace-with-subdirs.conf-keys= state.backend.gemini.local.dir state.backend.gemini.local.dir=/mnt/disk3/state,/mnt/disk5/state //Specifies the page compression format of Gemini (page is the smallest physical unit of Gemini storage) state.backend.gemini.compression.in.page=Lz4 //Specifies the percentage of memory allowed by Gemini state.backend.gemini.heap.rate=0.7 //Specifies the size of a single storage file for Gemini state.backend.gemini.log.structure.file.size=134217728 //Specifies the number of worker threads for Gemini state.backend.gemini.region.thread.num=8
Parameters corresponding to resource used by job
Memory related parameters
Note：The full amount of sample splicing load can not be fully served by 16 machines, so we carry out pressure measurement by sampling the data in different proportions. When back pressure occurs, we think the operation has reached the performance bottleneck.
It can be seen from the above comparison that under the premise of the same data, job processing logic and hardware configuration, the amount of data successfully processed by Gemini is 2.4 times that of rocksdb (17280 vs 7200 pieces / s). At the same time, through the comparison of hardware resource consumption, rocksdb can reach the disk IO bottleneck faster, while Gemini has higher memory and CPU utilization.
About the author:
Cao Fuqiang, Chen Xin, senior system engineer, Weibo machine learning R & D center. Now I am responsible for the data calculation / data storage module of microblog machine learning platform, mainly involving real-time calculation of Flink, storm and spark streaming, data storage of Kafka and redis, offline calculation of hive and spark, etc. At present, we focus on the application of Flink / Kafka / redis in Weibo machine learning scene, providing framework, technology and application support for machine learning.