On how well the voice social video live broadcast platform fits into Apache dolphin scheduler


On how well the voice social video live broadcast platform fits into Apache dolphin scheduler

On Apache dolphin scheduler& Apache Shenyu (incubation) meetup, YY live broadcast software engineer yuanbingze shared with us the adaptation and exploration of YY live broadcast based on Apache dolphin scheduler.

This speech mainly includes four parts:

  • Background of introducing Apache dolphin scheduler into YY live broadcast
  • Introduction of Apache dolphin scheduler
  • Adaptation of Apache dolphin scheduler application
  • Future planning of YY live broadcast

About Instructor

On how well the voice social video live broadcast platform fits into Apache dolphin scheduler


YY live broadcast software engineer, with more than 10 years of work experience, is mainly engaged in the development of risk control big data platform, deeply interested in common big data components, and has rich research and development experience.


YY live broadcast is a leading voice and social video live broadcast enterprise in China. At present, the main responsibility of our team is to ensure the business security of the company.

01 technical status

At present, we adopt a layered technical architecture. The lowest layer is the data source layer, followed by the collection layer, storage layer, management layer, computing layer and application layer from bottom to top.

At the data source layer, we currently pull a relational database data from various business parties, as well as the data transmitted to us through the API, and some data is transmitted to us through the Kafka stream.

The acquisition layer adopts a set of data acquisition system developed by ourselves.

In the storage layer, at present, we mainly put data in relational databases, such as Clickhouse, and a small part in some non relational databases, such as redis and gallery. Of course, most of the data is stored in the big data system.

Management we mainly have a big data management system, combined with a computing scheduling, task management system and service governance platform developed by ourselves.

02 problems before scheduling Apache dolphin scheduler

1. The scheduling platform is complex: in addition to the task scheduling based on XXL job, some old projects use cronab, springboot, scheduler, quartz and other management tasks to start.

2. Strong demand for task dependency: at present, the scheduling we use can only set the execution of a single task, and cannot form a workflow through task dependency. Task dependency settings rely heavily on personal experience to set the timing time. In fact, many tasks need dependencies.

3. The tasks are complex and diverse: at present, the tasks include spark and Flink tasks based on big data system, various Java service tasks, shell, Java application, python, etc. in the service governance platform.

Introduction process

In demand research, we actually need a dispatching platform that meets the following conditions:

1. Unified management tasks and dependencies

With the increasing demand for Business Computing, especially a variety of portrait computing and tasks, these tasks are scattered in various systems and are very difficult to manage. Some tasks have certain dependencies, but their time configuration depends on personal experience. There is an urgent need for a product that can uniformly configure and manage dependencies.

2. Compatible with internal platform systems of the company

We need the scheduling task platform to manage our tasks. At the same time, in order to be put into use quickly, the scheduling platform needs to be compatible with other platform systems of our company, such as internal dataX and cronab services.

3. High availability, high performance, high concurrency, easy to use

Finally, in order to ensure the stability of the business, we also need this scheduling platform to be highly available, high-performance, high concurrency, and easy to use.

Through investigation, we found that Apache dolphin scheduler is almost designed for us, and it can meet our needs without much modification during the adaptation process.

Application adaptation

Apache dolphin scheduler is a distributed, decentralized, and easy to expand visual DAG workflow task scheduling system, which is committed to solving the complex dependencies in the data processing process and making the scheduling system available out of the box in the data processing process, which is very in line with our needs.

First, learn about the architecture of Apache dolphin scheduler to understand the following adaptation cases.

On how well the voice social video live broadcast platform fits into Apache dolphin scheduler

The Apache Apache dolphin scheduler mainly includes five modules: API, master, worker, log, and alert.

The API interface layer is mainly responsible for processing requests from the front-end UI layer. This service provides a unified restful API to provide external request services. The interface includes workflow creation, definition, query, modification, publishing, offline, manual start, stop, pause, resume, and execution from this node.

Masterserver adopts the distributed centerless design concept. Masterserver is mainly responsible for DAG task segmentation, task submission monitoring, and monitoring the health status of other masterservers and workerservers. When the masterserver service is started, it registers a temporary node with zookeeper, and carries out fault tolerance by listening to the changes of zookeeper temporary nodes.

Workerserver also adopts the distributed centerless design concept. Workerserver is mainly responsible for task execution and providing log services. When the workerserver service starts, it registers a temporary node with zookeeper and maintains a heartbeat. The workserver also provides a logger service.

Alert provides alarm related interfaces, which mainly include two types of alarm data storage, query and notification functions. The notification function includes email notification andSNMP (not implemented yet)Two.

At present, we deploy version 2.0, mainly using four physical machines. On these four physical machines, we deploy two master instances, two API instances, three worker and logger instances, and one alert instance.

Next, share 3 specific adaptation cases.

The first is the adaptation to our service governance platform, which is mainly used for task monitoring; Although Apache dolphin scheduler itself provides a task monitoring module, our colleagues have long been used to using the service governance platform for unified management and monitoring. Therefore, we need to timely report the Apache dolphin scheduler task running status to the service governance platform.

01 service governance adaptation – masterserver service description

Before adaptation, learn more about the masterserver service. The masterserver provides:

The distributed quartz distributed scheduling component is mainly responsible for starting and stopping scheduled tasks. When quartz starts a task, there will be a thread pool in the master to handle the subsequent operations of the task;

Masterschedulerthread is a scanning thread, which scans the command table in the database regularly, and performs different business operations according to different command types;

Masterexecthread (workflowexecutthread.java) is mainly responsible for DAG task segmentation, task submission monitoring, and logical processing of various command types;

Mastertaskexecthread is mainly responsible for task persistence.

02 service governance adaptation code

Our requirement is to monitor tasks. Through code analysis, we found that task submission and listening are mainly implemented in the methods in the workflowexecutethread class, which will start multiple instance threads. Responsible for task execution and monitoring respectively. The flow chart is as follows:

On how well the voice social video live broadcast platform fits into Apache dolphin scheduler
Task submission and monitoring flow chart

Our requirement is to monitor tasks. After analyzing the code, we found that workflowexecutethread mainly has two methods, startprocess and handle events, which respectively implement task execution and monitoring. In fact, we mainly inject the data collection code of our service governance platform into the handleevents method, so that the task listening status can be reported to our service governance platform in time.

The revised part is as follows:

On how well the voice social video live broadcast platform fits into Apache dolphin scheduler

The specific rendering in the service governance platform is as follows:

On how well the voice social video live broadcast platform fits into Apache dolphin scheduler

In addition to monitoring the status of our specific tasks, we also do some monitoring by projects. Finally, we do monitoring operations through the service governance platform. For example, if some tasks are important, we will configure some telephone alarms, that is, once the task fails or is not completed on time, we will make a telephone notification.

03 dataX service adaptation process

The second case is about the adaptation process of dataX services. When we studied Apache dolphin scheduler, we found that it has integrated dataX type tasks, which is very friendly to us. Because we also have a considerable number of tasks implemented through dataX, and we have also developed some dataX plug-ins to adapt to the reading and writing of internal systems and stored data.

DataX adaptation is mainly divided into two parts. One part is implemented through this custom template. In fact, this part can be implemented by copying some previous dataX services and modifying them slightly. It mainly involves some data interaction between some non relational databases.

The interaction between pure relational databases still needs to be realized through configuration.

First, we encountered a small bug when configuring the Clickhouse read / write task.

04 dataX service adaptation Clickhouse compatible \8092

When we use dataX to read the data from the Clickhouse data source, we find that in SQL, as long as parameters are referenced, whether time parameters or other parameters, they will fail when submitted. We suspect that there may be some bugs. When reading the error log, we also find that when Apache dolphin scheduler submits SQL, the parameters are directly submitted to Clickhouse for execution without being replaced, Because Clickhouse does not recognize our Apache dolphin scheduler parameter, it throws an exception directly. We combed the process of Apache dolphin scheduler reading Clickhouse when executing dataX tasks. The process of changing our Apache dolphin scheduler configuration to dataX configuration is as follows:

On how well the voice social video live broadcast platform fits into Apache dolphin scheduler

The first thing the system needs to do is to parse all the syntax of SQL, and then get some column information through the syntax. At this time, it needs to call the SQL parser. During this process, if Apache dolphin scheduler does not replace this parameter, an error will occur during the execution of this circle, resulting in the failure of the whole task.

Therefore, in the process of solving the problem, since the Clickhouse parser may not be available, the best way is to directly add a parser. First, build a JSON file, then format all the parsed chains, finally parse the syntax, call it layer by layer, and finally call the target parser.

05 time parameter adapts to the current situation of Apache dolphin scheduler

The last case is about time parameter adaptation.

Although Apache dolphin scheduler provides time parameters, most of our own data needs unixtime time accurate to milliseconds. After reading the Apache dolphin scheduler documentation, we regret to find that it does not provide an implementation of this type of time parameter. During browsing the source code, we found that Apache dolphin scheduler provides a timestamp function, which can actually provide a unixtime time value.

When using timestamp, we found two small problems. Firstly, timestamp directly expresses unixtime with some ambiguity. Secondly, timestamp only supports the second level, and most of our data needs the millisecond level. For the convenience of use, we have made some modifications to adapt this part.

On how well the voice social video live broadcast platform fits into Apache dolphin scheduler

Adaptation process

The first thing we do is to eliminate ambiguity. In Apache dolphin scheduler, timestamp is a way to express time. From the explanation of timestamp and UNIX time expression obtained from wiki encyclopedia, it can be seen that timestamp is usually expressed by date plus time, but UNIX time is Greenwich mean time, which has been from zero:00, zero minutes and zero seconds on January 1, 1970, and the time expression of microseconds is not considered, Integer is used.

After defining the requirements, you need to know how to implement them. Through analyzing the code, we found that the time parameter function is implemented by calling layer by layer through API, and finally the main functions are implemented through the calculatetime method in the timeplaceholderutils class. During the implementation of this method, constants in the taskconstants class that express the names of time functions will also be called. So we modified some constants of the taskconstants class. And because we need millisecond level functions, we add a milli_ Unixtime function. Finally, in order to meet the needs of device users, we added some functions with higher accuracy, such as microsecond and nanosecond functions.

On how well the voice social video live broadcast platform fits into Apache dolphin scheduler
On how well the voice social video live broadcast platform fits into Apache dolphin scheduler

As for the complement function, after using Apache dolphin scheduler, we only need to select the complement function when we manually execute the task, and then fill in the date we want to schedule, so that we can supplement directly. At the same time, we can also fill in the parallelism. This function is very practical for us. After Apache dolphin scheduler version 2.0, the problem of poor performance in time configuration and execution has also been solved, which brings great convenience in use.

Future planning

In the process of using, we found that the tasks configured through Apache dolphin scheduler do not support the highly available solution for using data sources. This demand is quite strong here, so we are also making highly available adaptation at present.

Secondly, we currently use the 2.0 version of Apache dolphin scheduler. Because the community is relatively active, the version upgrade is relatively fast. Even a small version upgrade will bring some great functions and design changes. For example, in the new version, the alarm function has been plug-in, and some problems of complement date conversion have been solved. This also drives our team to upgrade to the new version to experience some new functions. At present, Apache dolphin scheduler is only used within our own small team, but we are also thinking about a feasible solution that can be widely used throughout the company.

Although Apache dolphin scheduler perfectly solves most of our problems and greatly improves our work efficiency. However, in various complex situations, we will still encounter some small bugs, and we will submit them to the official after repair in the future. Of course, we have also tried some small future in the process of use, and will submit them to the official for discussion in the future.

Participation contribution

With the rapid rise of open source in China, the Apache dolphin scheduler community is booming. In order to do better and easy-to-use scheduling, we sincerely welcome open source loving partners to join the open source community, contribute their own strength to the rise of open source in China, and make the local open source go global.

We also hope that the first pr (document and code) is simple. The first PR is used to familiarize ourselves with the submission process, community collaboration and feel the friendliness of the community.

The community has summarized the following list of questions for novices:https://github.com/apache/dol…

List of non novice questions:https://github.com/apache/dol…

How to participate in contribution links:https://dolphinscheduler.apac…

Come on, the dolphin scheduler open source community needs your participation to contribute to the rise of open source in China. Even if it is just a small tile, the power gathered is huge.

If you want to participate in open source, we have a seed incubation group of contributors. You can add a community assistant, Leonard DS, to teach you by hand (contributors can answer all questions regardless of their level. The key is to have a heart willing to contribute).

Come on, the open source community is looking forward to your participation.

Apache dolphin scheduler is a cloud native big data workflow scheduling platform with a powerful visual interface. It has been running stably in the production environment of 1000+ companies.

Recommended Today

A front-end developer's Vim is the same as an IDE

Here is my new configurationjaywcjlove/vim-webI've been grinding it, and it's basically ready to use. Take it out and cheat the star Install The latest version of Vim 7.4+ uses (brew install macvim) installation, vim version updatebrew install macvim –override-system-vim View configuration locations # Enter vim and enter the following characters :echo $MYVIMRC download vim-web Download […]