The kangaroo cloud officially opened the big data task scheduling platform – Taier!

Time:2022-5-8

On February 22, 2022, in today’s special day, Taier (TAIA), which has experienced years of continuous iteration and concurrent scheduling of tens of millions of cycle instances, is finally open source!

GitHub open source address:

https://github.com/DTStack/Taier

Official document address:

https://dtstack.github.io/Taier/

This is an important milestone of the kangaroo cloud open source project and represents the determination of the kangaroo cloud technology R & D team to open source. We hope to help more people explore the business scenarios of big data platform through technology sharing. At the same time, we sincerely welcome more developers to participate in the community. Committer is waiting for nothing!

︱ origin: tai’a comes out of the scabbard

Taier’s name comes from TAIA, one of China’s top ten famous swords.

The kangaroo cloud officially opened the big data task scheduling platform - Taier!

640.png

Taier Logo

Tai’a is the most precious treasure of the state of Chu in the spring and Autumn period and the Warring States period. It was jointly built by the famous sword maker Ou Yezi and the generals. It is said that the state of Chu defeated the Jin army with TAIA sword spirit on the occasion of life and death. It is known as the mighty sword of princes in the world. It symbolizes the power of indomitable and powerful heart. Just like Taier’s strong and stable task scheduling ability, it can handle 15W + Super huge task volume every day, which not only greatly reduces the development cost of enterprise ETL, but also effectively ensures the smooth operation of big data platform. It has powerful functions, just like Taier’s infinite power.

︱ bright sword: Taier’s birth

The birth of Taier is closely related to the development of the times.

Today, digital transformation has become a global wave, and the construction of big data platform has become an essential infrastructure in the new era. With the deepening of digital transformation, many enterprises will involve a lot of work in data acquisition, processing, calculation and other aspects in the process of building data center. With the continuous superposition of needs, a single system is difficult to meet complex business. There is an urgent need for a task scheduling system compatible with multiple subsystems to cooperate with each other. Based on this background, Taier distributed DAG task scheduling system came into being.

Taier is an out of the box distributed visual DAG task scheduling system. Technical developers can directly develop business logic in Taier without paying attention to the complex dependencies of tasks and the architecture implementation of the underlying big data platform, and focus their work more on business.

The kangaroo cloud officially opened the big data task scheduling platform - Taier!

Architecture diagram of Taier dispatching system

In the design of Taier logo, we integrate building blocks, swords, beehives and other elements into the design around the open, inclusive and easy-to-use characteristics of the system itself. The main logo is formed by overlapping four building blocks. It is shaped like a sword, with combination and separation. It conveys the concept of open source items (openness and inclusiveness. At the same time, it also shows that Taier adopts the distributed mode – with strong decoupling and expansibility.

The kangaroo cloud officially opened the big data task scheduling platform - Taier!

640 (2).png

Creative interpretation of Taier logo

The bottom layer of the logo adopts the hexagonal honeycomb structure. The hexagonal honeycomb is the most labor-saving, material-saving and stable arrangement in nature. Its six symmetry axes can be rotated without changing the shape. The hexagon is selected as the border of the logo, which is intended to convey Taier’s characteristics of reducing development costs and improving the stability of the data platform.

Highlights: Taier’s functional advantages

As a distributed and visual DAG task scheduling system, TAIA Taier was born out of the one-stop big data development platform of kangaroo cloud – stack dtinsight. Its technical implementation comes from stack distributed scheduling engine dagschedulex. Dagschedulex is one of the important infrastructures of stack products and is responsible for the scheduling and operation of all task instances of the big data platform. TAIA Taier is an important hub of dagschedulex. It is responsible for scheduling the huge daily task volume. Years of continuous iteration and precipitation have created TAIA Taier’s six core advantages:

1、 Ultra high stability

  • Single point of failure: decentralized distributed mode

  • High availability mode: zookeeper

  • Overload handling: distributed node + two-level storage strategy + queue mechanism. Each node can handle task scheduling and submission; When there are many tasks, they will be cached in the memory queue first. If the maximum number of queues that can be configured is exceeded, they will be cached in the whole tribal database; Task processing is consumed in queue mode, and the queue asynchronously obtains executable instances from the database

  • Actual combat test: the production environment of hundreds of enterprise customers has been tested

2、 Super ease of use, one-stop task scheduling

  • Support the scheduling of big data jobs spark, Flink, hive and Mr

  • It supports many task types, and currently supports spark SQL and flinkx; Subsequent open source support: sparkmr, pyspark, flinkmr, python, shell, Jupiter, tersorflow, pytoch, hadoopmr, kylin, ODPs, SQL tasks (mysql, PostgreSQL, hive, impala, Oracle, sqlserver, tidb, greenplug, inductor, Kingbase, PRESTO)

  • Visual workflow configuration: it supports encapsulated workflow and single task operation. It does not need to encapsulate workflow and supports drag and drop mode to draw DAG

  • Dag monitoring interface: the operation and maintenance center supports the viewing of cluster resources, understands the remaining status of current cluster resources, and supports a glance at the key information in the scheduling queue, such as batch stop of tasks, task status, task type, retry times, task running machine, visual variables and so on

  • Scheduling time configuration: visual configuration

  • Multi cluster connection: support a set of scheduling system to connect multiple Hadoop clusters

3、 Extraordinary compatibility, support multi version engine

  • Multiple versions of spark, Flink, hive, Mr and other engines can coexist. For example, flink1 can be supported at the same time 10、Flink1. 12 (subsequent open source)

4、 Secure and reliable, supporting Kerberos

  • Spark、Flink、Hive

5、 Rich system parameters

  • It supports three kinds of time benchmarks and can flexibly set the output format

6、 Excellent scalability, supporting capacity expansion in multiple ways

  • The distributed mode is considered in the design. At present, the overall Taier horizontal capacity expansion mode is supported; Subsequent open source support: scheduler / worker separation deployment mode.

  • The scheduling capacity increases linearly with the cluster

Terminal: Taier user interface

The kangaroo cloud officially opened the big data task scheduling platform - Taier!
The kangaroo cloud officially opened the big data task scheduling platform - Taier!
The kangaroo cloud officially opened the big data task scheduling platform - Taier!
The kangaroo cloud officially opened the big data task scheduling platform - Taier!
The kangaroo cloud officially opened the big data task scheduling platform - Taier!

Prospect: future iteration plan

Taier scheduling platform is a component of the data platform framework, which can meet the needs of daily data analysis, processing and display of enterprises. In the future, with the increase of business access and data scale, Taier will continue to improve the user experience and plan to optimize:

  • Task types: support sparkmr, pyspark, flinkmr, python, shell, Jupiter, tersorflow, pytoch, hadoopmr, kylin, ODPs, SQL (mysql, PostgreSQL, hive, impala, Oracle, sqlserver, tidb, greenplug, inductor, Kingbase, PRESTO)

  • Scheduling mode: support Yan / k8s at the same time

  • Computing engine: spark-2.1 is also supported x/2.4. x. Flink-1.10/1.12 (with subsequent versions of Flink)

  • Deployment method: support scheduler / worker integration and separate deployment at the same time

  • Function support: support transaction calendar and event driven

  • External system connection: support Taier system to connect with external dispatching system (azkban, control-m, DS dispatching)

︱ conclusion:

Taier uses many open source projects of Apache, such as Flink and spark, as computing components to realize data synchronization and batch computing. Today’s Taier is thanks to the open source community. Because Taier takes it from the community, we hope to give back to the community by opening up this technology, Jointly carry forward the Apache culture of “community over code”. In the future, we will continue to launch the follow-up version of Taier with the attitude of inclusiveness, openness and diversity. We welcome more companies and individuals to participate in the developer team, make the Taier community more robust and healthy, and let more people enjoy the technological revolution brought by open source!

Recommended Today

Flutter synthetic games

Flutter synthetic gamesGitHub source address:https://github.com/CZXBigBrother/animals-merge-demo init.gif To realize the function, we need to understand the two shutter controls draggable and dragtarget It can be seen from the literal meaning that draggable can be dragged, and dragtarget is the drag target, which is to receive the control of draggable ddd.png We can only receive controls that […]