Application and practice of Apache Flink in auto home

Time:2021-12-25

Introduction: how auto home launched autostream platform based on Flink and continued polishing.
This paper sorts out the topic “application and practice of Apache Flink in auto home” shared by Di Xingxing, head of real-time computing platform of auto home, in Flink forward Asia 2020. The main contents include:
Background and current situation, autostream platform, real-time ecological construction based on Flink, follow-up planning.

1、 Background and current situation

1. Phase I

Before 2019, most of the real-time business of autohome is running on storm. As an early mainstream real-time computing engine, storm captured a large number of users by virtue of simple spuut and bolt programming models and the stability of the cluster itself. We built storm platform in 2016.

Application and practice of Apache Flink in auto home

With the increasing demand for real-time computing and the gradual increase of data scale, storm highlights its shortcomings in development and maintenance costs. Here are some pain points:

High development cost

We always use the lambda architecture, which will use the offline data of T + 1 to correct the real-time data, that is, the offline data will prevail. Therefore, the real-time calculation caliber should be completely consistent with the offline data. The requirement document for real-time data development is offline SQL, and the core work of real-time developers is to translate the offline SQL into storm code, During this period, although some general bolts were encapsulated to simplify the development, it is still very challenging to accurately translate hundreds of lines of SQL offline into code, and each run has to go through a series of cumbersome operations of packaging, uploading and restarting, resulting in high debugging cost.

Computational inefficiency

Storm doesn’t support state well. It usually needs kV storage such as redis and HBase to maintain intermediate state. We used to rely heavily on redis. For example, in the common scenario of calculating UV, the simplest way is to use redis’s Sadd command to judge whether the uid already exists, but this method will bring high network io. At the same time, if there is no big promotion reported in advance or activities that lead to traffic doubling, it is easy to fill redis memory, and the operation and maintenance students will be killed by surprise. At the same time, the throughput of redis also limits the throughput of the whole job.

Difficult to maintain and manage

Due to the development by writing storm code, it is difficult to analyze metadata and blood relationship, poor readability, opaque calculation caliber and high business handover cost.

Log warehouse unfriendly

The data warehouse team is a team that directly connects business requirements. They are more familiar with the hive based SQL development mode and are usually not good at storm job development. This leads to some real-time requirements, which can only be followed by T + 1.

At this stage, we support the most basic real-time computing requirements, because the development threshold is relatively high, and many real-time businesses are completed by our platform development. We do both platform and data development, which seriously distracts our energy.

2. Phase II

Application and practice of Apache Flink in auto home

We began to investigate Flink engine in 2018. Its relatively complete SQL support and natural support for status attracted us. After learning and research, we began to design and develop Flink SQL platform in early 2019 and launched autostream 1.0 platform in mid-2019. The platform has been applied by the warehouse team, monitoring team and operation and maintenance team since its launch, and can be quickly used by users, mainly due to the following:

Low development and maintenance cost: most real-time tasks of car home can be realized by Flink SQL + UDF. The platform provides commonly used source and sink, as well as UDF commonly used in business development. At the same time, users can write UDF themselves. The development based on “SQL + configuration” can meet most requirements. For custom tasks, we provide an SDK that is convenient for development and use to help users quickly develop custom Flink tasks. The users of the platform are not just professional data developers. After basic learning, ordinary development, testing and operation and maintenance personnel can complete daily real-time data development on the platform and realize platform empowerment. The data assets can be managed, and the SQL statement itself is structured. By parsing the SQL of a job and combining the DDL of source and sink, we can easily know the upstream and downstream of the job and naturally retain the blood relationship.

High performance: Flink can perform calculations based entirely on state (memory, disk). Compared with the previous scenarios that rely on external storage for calculations, the performance is greatly improved. During the 818 active pressure measurement, the modified program can easily support the real-time calculation of dozens of times the original flow, and the lateral expansion performance is very good.

Comprehensive monitoring and alarm: the user hosts the task on the platform, the survival of the task is the responsibility of the platform, and the user can focus on the logic development of the task itself. For SQL tasks, SQL is highly readable and easy to maintain; For custom tasks, based on our SDK development, users can focus more on combing business logic. Whether it is SQL tasks or SDK, we have embedded a large number of monitoring and associated with the alarm platform to facilitate users to quickly find, analyze, locate and repair tasks and improve stability.

Enabling business: it supports the hierarchical model of data warehouse, and the platform provides good SQL support. Data warehouse personnel can use SQL to apply the construction experience of offline data warehouse to the construction of real-time data warehouse. Since the platform went online, data warehouse gradually began to meet the real-time computing requirements.
Application and practice of Apache Flink in auto home

Pain points:

  • Ease of use needs to be improved. For example, users cannot manage UDF by themselves. They can only use the UDF built in the platform or send the jar package to the platform administrator to handle the upload problem manually.
  • With the rapid growth of platform workload, the cost of platform on call is very high. First, we often face some basic problems of new users:

Use of the platform;

  • Problems encountered in the development process, such as why package errors are reported;
  • Use of Flink UI;
  • The meaning of monitoring graphics and how to configure alarms.

There are also some questions that are not easy to answer quickly:

  • Jar package conflict;
  • Why consumption Kafka is delayed;
  • Why did the task report an error.

In particular, the delay problem, our common data skew, GC and backpressure problems can directly guide users to view the Flink UI and monitoring chart, but sometimes they still need to manually view jmap, jstack and other information on the server, and sometimes they need to generate a flame chart to help users locate performance problems.

At the initial stage, we didn’t cooperate with the operation team, but our developers directly connected to deal with these problems. Although a large number of documents were supplemented during the period, the overall on call cost was still very high.

In case of failure in Kafka or yarn, there is no quick recovery scheme, and when faced with some re insurance services, it is a little stretched. As we all know, there is no environment or component that is always stable and does not fail. When there is a major failure, there needs to be a response scheme to quickly recover the business.

There is no reasonable control over resources, and there is a serious waste of resources. With the increasing number of users using platform development tasks, the number of jobs on the platform is also increasing. Some users can not control the use of cluster resources well, and often apply for too many resources, resulting in low operation efficiency or even idle, resulting in a waste of resources.

In autostream1 At this stage of 0 platform, the method of SQL based development greatly reduces the threshold of real-time development. All business parties can realize the development of real-time business by themselves. At the same time, after simple learning, students in shucang began to connect real-time business, releasing our platform party from a large number of business needs, so that we can concentrate on the work of the platform.

3. Current stage

Application and practice of Apache Flink in auto home

For the above aspects, we have made the following upgrades:

Introduce jar service: support users to upload UDF jar packages and reference them in SQL fragments to realize self-service management of UDF. At the same time, you can also configure jars in the jar service for custom jobs. In the case that multiple jobs share the same jar, you only need to configure the jar package path in the jar service in the job to avoid the tedious operation of uploading jars repeatedly every time you go online;

Self service diagnosis: we have developed functions such as dynamic adjustment of log level and self-service viewing of flame diagram to facilitate users to locate problems and reduce our daily on call cost;

Job health check function: analyze from multiple dimensions, score each Flink job, and give corresponding suggestions for each low sub item;

Flink job level fast disaster recovery: we have built two sets of yarn environments. Each yarn corresponds to a separate HDFS. Before the two HDFS, two-way replication of checkpoint data was carried out through snapshot. At the same time, the function of switching clusters was added on the platform. When a yarn cluster is unavailable, users can self-help on the platform, Select the checkpoint of the standby cluster;

Kafka multi cluster architecture support: use our self-developed Kafka SDK to support fast switching of Kafka clusters;

Docking with the budget system: the resources occupied by each job are directly corresponding to the budget team, so as to ensure that the resources will not be occupied by other teams to a certain extent. At the same time, the budget administrator of each team can view the budget usage details to understand which businesses in the team are supported by his own budget.

At present, users have become familiar with the use of the platform. With the launch of self-service health check and self-service diagnosis functions, the daily on-call frequency of our platform is gradually decreasing and gradually entering the virtuous cycle stage of platform construction.

4. Application scenarios

Application and practice of Apache Flink in auto home

The data used by auto home for real-time calculation is mainly divided into three categories:

  • Client logs, that is, the click stream logs we call internally, include startup logs, duration logs, PV logs, click logs and various event logs reported by the client. These logs are mainly user behavior logs. They are the basis for us to build a traffic wide table, UAS system and real-time portrait in the real-time data warehouse. On this basis, they also support intelligent search Intelligent recommendation and other online services; At the same time, the basic traffic data is also used to support traffic analysis and real-time effect statistics of each business line and support daily operation decisions.
  • Server side logs, including nginx logs, logs generated by various back-end applications, and logs of various middleware. These log data are mainly used for health monitoring, performance monitoring and other scenarios of back-end services.
  • There are three kinds of real-time change records in the business library: MySQL binlog, sqlserver CDC and tidb ticdc data. Based on these real-time data change records, we have built basic services such as content console and resource pool by abstracting and standardizing various content data; There are also some business data real-time statistics scenarios with simple logic. The result data is used for real-time large screen, compass, etc. for data display.

The above three types of data will be written to the Kafka cluster in real time. Calculations will be made for different scenarios in the Flink cluster. The resulting data will be written to redis, mysql, elasticsearch, HBase, Kafka, kylin and other engines to support upper tier applications.

Some application scenarios are listed below:

Application and practice of Apache Flink in auto home

5. Cluster size

At present, Flink cluster server is 400 +, the deployment mode is yarn (80%) and kubernetes, the number of running jobs is 800 +, the daily computing volume is 1 trillion, and the peak processing data is 20 million pieces per second.

Application and practice of Apache Flink in auto home

2、 Autostream platform

1. Platform architecture

Application and practice of Apache Flink in auto home

The above is the current overall architecture of autostream platform, mainly including the following parts:

AutoStream core System

This is the core service of our platform, which is responsible for integrating metadata service, Flink client service, jar management service and interactive result query service, and exposing the platform functions to users through the front-end page.

It mainly includes SQL and jar job management, database table information management, UDF management, operation record and historical version management, health check, self-service diagnosis, alarm management and other modules. At the same time, it provides the ability to connect with external systems and supports other systems to manage database table information, SQL job information and job start and stop operations through interfaces. The life cycle management and scheduling system based on akka task provides efficient, simple and low delay operation guarantee, and improves the efficiency and ease of use of users.

Metadata service (catalog like unified Metastore)

It mainly corresponds to the back-end implementation of Flink catalog. In addition to supporting basic library table information management, it also supports library table granularity permission control. Combined with our own characteristics, it supports user group level authorization.

At the bottom layer, we provide the plugin catalog mechanism, which can not only be used to integrate with Flink’s existing catalog implementation, but also facilitate us to embed custom catalogs. Through the plugin mechanism, we can easily reuse hivecatalog and jdbccatalog, so as to ensure the consistency of the cycle of the library table.

At the same time, the metadata service is also responsible for parsing the DML statement submitted by the user, identifying the dependent table information of the current job, which is used for job analysis and submission process, and recording blood relationship at the same time.

Jar Service

Various SDKs provided by the platform are managed uniformly on the jar service. At the same time, users can submit custom jars, UDF jars, etc. to the jar service for unified management on the platform, and then reference them in the job through configuration or DDL.

Customized Flink job client

We are responsible for converting jobs on the platform into Flink jobs and submitting them to Yan or kubernetes. In this layer, we abstract Yan and kubernetes, unify the behavior of the two scheduling frameworks, expose unified interfaces and standardized parameters, weaken the differences between Yan and kubernetes, and lay a good foundation for the seamless switching of Flink jobs on the two frameworks.

The dependencies of each job are different. In addition to the management of basic dependencies, we also need to support personalized dependencies. For example, different versions of SQL SDKs, jars and UDFs uploaded by users themselves, so the submission stages of different jobs need to be isolated.

We adopt the method of jar service + process isolation. Through docking with jar service, we select the corresponding jar according to the job type and configuration, and submit it to a separate process for execution to realize physical isolation.

Result cache service

Is a simple caching service, which is used in the online debugging scenario of SQL job development stage. When we analyze the user’s SQL statement, we store the result set of the select statement in the cache service; Then, users can view the result data corresponding to SQL in real time on the platform by selecting the SQL sequence number (each complete select statement corresponds to a sequence number), which is convenient for users to develop and analyze problems.

Built in connectors (source & sink)

The rightmost part is mainly the implementation of various sources and sink. Some reuse the connectors provided by Flink, and some are our own connectors.

For each connector, we have added the necessary metric and configured it into a separate monitoring chart to facilitate users to understand the operation of the job and provide data basis for locating the problem.

2. SQL based development process

Based on the above functions provided by the platform, users can quickly realize the development of SQL jobs:

  • Create an SQL task;
  • Prepare DDL statement source and sink;
  • Write DML and complete the implementation of main business logic;
  • Check the results online. If the data meets the expectations, add an insert into statement and write it into the specified sink.
    Application and practice of Apache Flink in auto home

By default, the platform will save every SQL change record. Users can view the historical version online. At the same time, we will record various operations for jobs. In the job maintenance stage, we can help users trace the change history and locate problems.

The following is a demo for counting PV and UV data of the day:

Application and practice of Apache Flink in auto home

3. Metadata management based on catalog

Application and practice of Apache Flink in auto home

Main contents of metadata management:

  • Support permission control: in addition to basic library table information management, it also supports table granularity permission control. Combined with our own characteristics, it supports user group level authorization;
  • Plugin catalog mechanism: multiple other catalog implementations can be combined to reuse existing catalogs;
  • Unified library table life cycle behavior: users can choose to unify the life cycle of the tables on the platform and the underlying storage to avoid maintenance on both sides and repeated table creation;
  • The new and old versions are fully compatible: since we did not introduce Metastore service separately in autostream 1.0, in addition, the DDL SQL parsing module in 1.0 is a self-developed component. Therefore, when building Metastore service, we need to consider the compatibility of historical jobs and historical database table information.
  • For library table information, the new Metastore converts the new and old library table information into a unified storage format at the bottom, so as to ensure the compatibility of library table information.
  • For jobs, we use abstract interfaces and provide two Implementation Paths of v1service and v2service respectively to ensure the compatibility of new and old jobs at the user level.

The following is a schematic diagram of the interaction between several modules and Metastore:

Application and practice of Apache Flink in auto home

4. Udxf management

We have introduced the jar service service to manage various jars, including user-defined jobs, SDK components inside the platform, udxf, etc. based on the jar service, we can easily realize the self-service management of udxf. In the on k8s scenario, we provide a unified image. After the pod is started, the corresponding jars will be downloaded from the jar service to the container, Used to support job startup.

If the SQL submitted by the user contains function DDL, we will parse the DDL in the job client service and download the corresponding jar locally.

In order to avoid dependency conflicts with other jobs, we will start a child process to complete the operation of job submission each time. Udxf jars will be and added to the classpath. We have made some modifications to Flink, and this jar will be uploaded to HDFS when the job is submitted; Meanwhile, the autosql SDK will register UDF for the current job according to the function name and class name.

Application and practice of Apache Flink in auto home

5. Monitoring alarm and log collection

Thanks to Flink’s perfect metric mechanism, we can easily add metrics. For connectors, we have embedded rich metrics and configured the default monitoring kanban. Through the Kanban, we can view the monitoring charts of CPU, memory, JVM, network transmission, checkpoint and various connectors. At the same time, the platform interfaces with the company’s cloud monitoring system to automatically generate a default alarm strategy and monitor key indicators such as survival status and consumption delay. At the same time, users can modify the default alarm policy in the cloud monitoring system and add new alarm items to realize personalized monitoring and alarm.

The logs are written to the elasticsearch cluster through the cloud filebeat component, and kibana is open for users to query.

Application and practice of Apache Flink in auto home

The overall monitoring alarm and log collection architecture is as follows:

Application and practice of Apache Flink in auto home

6. Health examination mechanism

With the rapid growth of the number of jobs, there are many unreasonable use of resources, such as the waste of resources mentioned above. Most of the time, users are connecting with new requirements and supporting new services, and rarely go back to evaluate whether the resource allocation of jobs is reasonable and optimize the use of resources. Therefore, the platform has planned a version of the cost evaluation model, that is, the health inspection mechanism. The platform will make multi-dimensional health scores for activities every day. Users can view the score of a single activity and the score change curve in the last 30 days on the platform at any time.

Low score jobs will be prompted when users log in to the platform, and regular emails will be sent to remind users of optimization and rectification. After optimization jobs, users can actively trigger re scoring to view the optimization effect.

Application and practice of Apache Flink in auto home

We introduced a multi-dimensional, weight based scoring strategy to analyze and evaluate the indicators of multiple dimensions such as CPU, memory utilization, whether there is an idle slot, GC, Kafka consumption delay and the amount of data processed by a single core per second, combined with the calculation of the supplement graph, and finally produce a comprehensive score.

Each low score item will display the reason and reference range of low score, and display some guidance suggestions to assist users in optimization.

We have added a new metric, which uses a number of 0% ~ 100% to reflect the CPU utilization of taskmanagner. In this way, users can intuitively evaluate whether there is waste of CPU.

Application and practice of Apache Flink in auto home

The following is the general process of job scoring: first, we will collect and sort out the basic information and metrics information of running jobs. Then apply the rules we set to get the basic score and basic suggestion information. Finally, the score information and suggestions are integrated and comprehensively evaluated to obtain the comprehensive score and the final report. Users can view reports through the platform. For jobs with low scores, we will send an alarm to the home user of the job.

Application and practice of Apache Flink in auto home

7. Self diagnosis

As mentioned earlier, when users locate online problems, they can only turn to our platform, resulting in a large amount of on call workload and poor user experience. In view of this, we have launched the following functions:

Dynamically modify log level: we use storm’s method of modifying log level for reference and implement similar functions on Flink. By extending the rest API and RPC interface, we support modifying the log level of the specified logger to a certain log level and setting an expiration time. When it expires, the log changed to logger will be restored to info level again;

Support self-service viewing of thread stack and heap memory information: online viewing of thread stack (jstack) has been supported in Flink UI, and we have directly reused this interface; In addition, the interface for viewing heap memory (jmap) is added to facilitate users to view online;

Support online generation and viewing of flame diagram: flame diagram is a powerful tool to locate program performance problems. We use Alibaba’s Arthas component to increase Flink’s ability to view flame diagram online. When users encounter performance problems, they can quickly evaluate performance bottlenecks.
Application and practice of Apache Flink in auto home

8. Rapid disaster recovery based on checkpoint replication

Application and practice of Apache Flink in auto home

When real-time computing is applied in important business scenarios, once a single Yan cluster fails and cannot be recovered in a short time, it may have a great impact on the business.

In this context, we have built a yarn multi cluster architecture. Two independent yarns correspond to a set of independent HDFS environment, and checkpoint data is replicated between the two HDFS regularly. At present, the delay of checkpoint replication is stable within 20 minutes.

At the same time, at the platform level, we directly open the function of switching clusters to users. Users can view the replication of checkpoints online, select the appropriate checkpoint (of course, they can also choose not to recover from the checkpoint) for cluster switching, and then restart the job to realize the relatively smooth migration of jobs between clusters.

3、 Real time ecological construction based on Flink

The core scenario of autostream platform is to support the use of real-time computing developers, making real-time computing development simple, efficient, monitorable and easy to maintain. At the same time, with the gradual improvement of the platform, we began to explore how to reuse the autostream platform and how to apply Flink in more scenarios. Reusing autostream has the following advantages:

  • Flink itself is an excellent distributed computing framework with high computing performance, good fault tolerance and mature state management mechanism. The community is booming and its function and stability are guaranteed;
  • Autostream has a perfect monitoring and alarm mechanism. The job runs on the platform without connecting to the monitoring system separately. Meanwhile, Flink is very friendly to metric support and can easily add new metric;
  • With a large amount of technical precipitation and operation experience, through more than two years of platform construction, we have realized a relatively perfect management of the whole life cycle of Flink operation on autostream, and built jar service and other basic components. Through simple upper interface packaging, we can connect with other systems to make other systems have the ability of real-time computing;
  • Support yarn and kubernetes deployment.
    Application and practice of Apache Flink in auto home

Based on the above points, when building other systems, we give priority to reusing the autostream platform, docking in the form of interface call, and fully hosting the life cycle of the whole process of Flink job to the autostream platform. Each system gives priority to realizing its own business logic.

The autodts (access and distribution tasks) and autokafka (Kafka cluster replication) systems in our team are currently built based on autostream. Briefly introduce the integration method, taking autodts as an example:

Turn the task into Flink. The access and distribution tasks on autodts exist in the form of Flink jobs;

Connect with autostream platform and call the interface to realize the creation, modification, start, stop and other operations of Flink job. Here, Flink jobs can be either jar or SQL jobs;

The autodts platform builds personalized front-end pages and personalized form data according to business scenarios. After the form is submitted, the form data can be stored in MySQL; At the same time, it is necessary to assemble the job information and jar package address into the format defined by the autostream interface, automatically generate a Flink task on the autostream platform through interface call, and save the ID of the Flink task at the same time;

Start an access task of autodts and directly call the autostream interface to start the job.

1. Autodts data access and distribution platform

The autodts system mainly includes two functions:

Data access: write the change log in the database to Kafka in real time;
Data distribution: write the data accessed to Kafka to other storage engines in real time.

1.1 autodts data access

The following is the architecture diagram of data access:

Application and practice of Apache Flink in auto home

We maintain the Flink based data access SDK and define a unified JSON data format, that is, after the changed data of MySQL binlog, SQL server and tidb are accessed to Kafka, the data format is consistent. When downstream services are used, they are developed based on the unified format without paying attention to the type of original service library.

When the data is accessed to Kafka topic, the topic will be automatically registered as a flow table on the autostream platform, which is convenient for users.

An additional advantage of data access based on Flink construction is that it can realize accurate primary data access at low cost based on Flink’s accurate primary semantics, which is a necessary condition for supporting services with high data accuracy requirements.

At present, we are doing to access the full amount of data in the business table into Kafka topic. Based on Kafka compact mode, we can realize that the topic contains stock data and incremental data at the same time. This is very friendly for the data distribution scenario. At present, if you want to synchronize the data to other storage engines in real time, you need to first access the full amount of data based on the scheduling system, and then start the real-time distribution task to distribute the changed data in real time. With compact topic, the operation of full access can be omitted. Flink1. Version 12 already supports compact topic and introduces the upsert Kafka connector [1]

[1] https://cwiki.apache.org/conf…

Here is a sample data:

Application and practice of Apache Flink in auto home

By default, the flow table registered on the platform is schemaless, and users can use JSON related UDF to obtain the field data.

Application and practice of Apache Flink in auto home

The following is an example of using a flow table:

Application and practice of Apache Flink in auto home

1.2 autodts data distribution

Application and practice of Apache Flink in auto home

We already know that the data accessed to Kafka can be used as a flow table, and the data distribution task is essentially to write the data of this flow table to other storage engines. Since autostream platform already supports a variety of table sink (connectors), we only need to fill in the downstream storage type and address and other information according to the user, You can realize data distribution by assembling SQL.

By directly reusing the connector, the repeated development work is avoided to the greatest extent.

The following is an example of SQL corresponding to a distribution task:

Application and practice of Apache Flink in auto home

2. Kaka multi cluster architecture

In the practical application of Kafka, some scenarios need to be supported by Kafka multi cluster architecture. The following are some common scenarios:

Data redundancy disaster recovery, real-time replication of data to another standby cluster. When a Kafka cluster is unavailable, applications can be switched to the standby cluster to quickly restore services;

Cluster migration: when the computer room contract expires or goes to the cloud, the cluster migration needs to be done. At this time, the whole cluster data needs to be copied to the cluster of the new computer room to make the business migrate relatively smoothly;

In the read-write separation scenario, when using Kafka, most of the cases are more reads and less writes. In order to ensure the stability of data writing, you can choose to build a Kafka read-write separation cluster.

At present, we have built a Kafka multi cluster architecture. There are two main contents related to Flink:

The data replication program between Kafka clusters runs in Flink cluster;

The Flink Kafka connector has been modified to support fast switching of Kafka clusters.

2.1 overall structure

Application and practice of Apache Flink in auto home

Let’s take a look at the data replication between Kafka clusters, which is the basis for building a multi cluster architecture. We use mirrormaker2 to realize data replication. We transform mirrormaker2 into an ordinary Flink job and run it in the Flink cluster.

We have introduced route service and Kafka SDK to realize the Kafka cluster where clients can quickly switch access.

The client needs to rely on our own Kafka SDK, and bootstrap is no longer specified in the configuration The servers parameter, but by setting cluster Code parameter to declare the cluster you want to access. The SDK will be based on cluster Code parameter, access the route service to obtain the real address of the cluster, and then create a producer / consumer to start production / consumption data.

The SDK will listen for changes in routing rules. When you need to switch clusters, you only need to switch routing rules in the background of the route service. When the SDK finds that the routing cluster has changed, it will restart the producer / consumer instance and switch to a new cluster.

If a consumer has a cluster switch, because the offsets of topics in cluster1 and cluster2 are different, it is necessary to obtain the offsets of the current consumer group in cluster2 through the offset mapping service, and then consume from these offsets to achieve a relatively smooth cluster switch.

2.2 data replication between Kafka clusters

We use mirrormaker2 to realize data replication between clusters. Mirrormaker2 was introduced in Kafka version 2.4. The specific features are as follows:

Automatically identify new topics and partitions;

Automatically synchronize topic configuration: topic configuration will be automatically synchronized to the target cluster;

Automatically synchronize ACL;

Provide offset conversion tools: support to obtain the offset information corresponding to the group in the target cluster according to the source cluster, target cluster and group information;

Support extended black-and-white list strategy: it can be customized flexibly and take effect dynamically.

clusters = primary, backup

primary.bootstrap.servers = vip1:9091

backup.bootstrap.servers = vip2:9092

primary->backup.enabled = true

backup->primary.enabled = true

This configuration completes the two-way data replication from the primary cluster to the backup cluster, and the data in topic1 in the primary cluster will be copied to the primary in the backup cluster Topic1 in this topic, the topic naming rule of the target cluster is sourcecluster Sourcetopicname, you can customize the naming policy by implementing the replicationpolicy interface.

Application and practice of Apache Flink in auto home

2.3 introduction to topics related to mirrormaker2

Topic in source cluster

Heartbeat: store heartbeat data;

mm2-offset-syncs. targetCluster. Internal: the corresponding relationship between the offset of the storage source cluster (upstream offset) and the offset of the target cluster (downstream offset).

Topics in the target cluster

mm2-configs. sourceCluster. Internal: the connect framework comes with it to store the configuration;

mm2-offsets. sourceCluster. Internal: the connect framework is built-in to store the offset currently processed by the workersourcetask. In this scenario, which offset is used to synchronize the current data to the topic partition of the source cluster is more like Flink’s checkpoint concept;

mm2-status. sourceCluster. Internal: the connect framework is built-in to store the connector status.

The above three use the kafkabasedlog tool class in the connect runtime module. This tool class can read and write topic data in compact mode. At this time, mirrormaker2 uses topic as kV storage.

sourceCluster. checkpoints. Internal: record the offset corresponding to the sourcecluster consumer group in the current cluster. Mm2 will periodically read the offset submitted by the consumer group corresponding to topic from the source Kafka cluster and write it to the sourcecluster of the target cluster checkpoints. Internal topic.

Application and practice of Apache Flink in auto home

2.4 deployment of mirrormaker2

The following is the process of mirrormaker2 job running. Creating a data replication job on the autokafka platform will call the autostream platform interface to create an mm2 job accordingly. When starting a job, it will call the interface of autostream platform to submit the mm2 job to Flink cluster for running.

Application and practice of Apache Flink in auto home

2.5 routing service

The route service is responsible for processing the client’s routing request, matching the appropriate routing rules according to the client’s information, and returning the final routing result, that is, the cluster information, to the client.

Support flexible configuration of routing rules based on cluster name, topic, group, ClientID and client-defined parameters.

The following example is to route the consumer with Flink job ID 1234 to the cluster_ A1 cluster.

Application and practice of Apache Flink in auto home

2.6 Kafka SDK

Using native Kafka clients cannot communicate with route service. The client needs to rely on our Kafka SDK (SDK developed in car home) to communicate with route service to achieve the effect of dynamic routing.

The Kafka SDK implements the producer and consumer interfaces. In essence, it is the agent of Kafka clients. The Kafka SDK can be introduced with less business changes.

After the business relies on the Kafka SDK, the Kafka SDK will be responsible for communicating with the route service and listening for route changes. When it is found that the cluster of routes has changed, it will close the current producer / consumer, create a new producer / consumer and access the new cluster.

In addition, the Kafka SDK is also responsible for uniformly reporting the metric of producer and consumer to Prometheus of the cloud monitoring system. By viewing the pre configured dashboard of the platform, you can clearly see the production and consumption of the business.

At the same time, the SDK will collect some information, such as application name, IP port, process number, etc., which can be found on the autokafka platform to facilitate us and users to locate problems together.

Application and practice of Apache Flink in auto home

2.7 Offset Mapping Service

When the route of the consumer changes and the cluster is switched, the situation is somewhat complicated. At present, mirrormaker2 consumes the data from the source cluster and then writes it to the target cluster. The same data can be written to the same partition of the target topic, but the offset is different from the source cluster.

In this case, mirrormaker2 will consume the offset of the source cluster__ consumer_ The offsets data, plus the offset corresponding to the target cluster, is written to the sourcecluster of the target cluster checkpoints. Internal topic.

At the same time, the mm2 offset syncs of the source cluster targetCluster. Internal topic records the offset mapping relationship between the source cluster and the target cluster. Combined with these two topics, we build an offset mapping service to complete the offset conversion of the target cluster.

Therefore, when the consumer needs to switch clusters, it will call the offset mapping service interface to obtain the offsets of the target cluster, and then actively seek to these locations to start consumption, so as to achieve relatively smooth cluster switching.

Application and practice of Apache Flink in auto home

2.8 integration of Flink and Kafka multi cluster architecture

Since the Kafka SDK is compatible with the usage of Kafka clients, users only need to change the dependencies and set cluster code、Flink. ID and other parameters.

After the cluster switch of Producer / consumer occurs, the metric data of Kafka is not re registered due to the creation of a new producer / consumer instance, resulting in that the metric data cannot be reported normally. We added the unregister method in the abstractmetricgroup class. When listening to the switching event of Producer / consumer, we can re register Kafka metrics.

So far, we have completed Flink’s support for Kafka multi cluster architecture.

Application and practice of Apache Flink in auto home

4、 Follow up planning

Application and practice of Apache Flink in auto home

At present, most of the data statistics scenarios we support are based on traffic data or user behavior data. These scenarios do not have high requirements for accurate once semantics. With the gradual improvement of the community’s support for change log, at the same time, our data access system supports accurate once semantics, and is doing the function of full access of business tables to Kafka, Therefore, accurate data statistics can be realized in the follow-up to support the statistical needs of transactions, clues and finance.

Some companies have put forward the concept of Lake warehouse integration. The data Lake technology can indeed solve some pain points of the original data warehouse architecture. For example, the data does not support update operation and can not achieve quasi real-time data query. At present, we are trying to integrate Flink with iceberg and Hudi, and we will find scenarios in the company and land them later.

Original link
This article is the original content of Alibaba cloud and cannot be reproduced without permission.