[open source address] give up Flink NET5. 0 development csharpflink, brief design, deployment and secondary development instructions.

Time:2022-1-8

GitHub address:https://github.com/wxzz/CSharpFlink
Gitee address:https://gitee.com/wxzz/CSharpFlink


 1 overview and background

We have a national public cloud platform for industry, which transmits data to the platform in real time through special line or 4G link, processes about 100 million data every day, and provides real-time online services and offline data analysis services for on-site users. It has been online and has been running stably for nearly 3 years. At the same time, it also provides private cloud construction services for industrial enterprises.

We plan to use Flink as the real-time computing part of the background of the cloud platform to basically realize the aggregation computing of data points, expression rule computing and other businesses, and further realize the needs of machine learning or custom complex algorithms.

After nearly a year of research and development, we have basically realized aggregation, logic and other services, but we feel that Flink is relatively heavy, and the level of application and operation and maintenance requirements are relatively high.

Based on the above situation, we independently use net 5.0 to develop a set of csharpflink real-time computing components to support the basic requirements of custom data source, computing and storage.

2 application scenarios

It is mainly for real-time aggregation and expression calculation of data points during the construction of Internet of things, industrial Internet, private cloud or public cloud platform. Application scenarios include:

(1) Aggregate calculation within the real-time time window of data points, such as maximum value, minimum value, average value, sum value, mode, variance, median, etc. you can customize secondary development.

(2) Recalculation of data supplement or update within a period of time of the historical delay window of data points.

(3) The expression calculation of data points supports user-defined c# script editing, real-time alarm or data deep processing.

(4) The master node is responsible for computing task distribution, and the work node is responsible for task calculation and result storage.

3 frame features

The main features are mainly based on the refinement and summary of our years of experience in Internet of things and industrial projects to meet the application scenarios. The features include:

(1) Use the latest net 5.0 for development, completely cross platform.

(2) Recalculation of data reissue or update outside the range of real-time data window, for example, the current 5-second real-time data window supports data supplement and update before 5 seconds, and recalculates and updates to the data storage unit.

(3) Real time data expression calculation supports timing calculation or data value change event change trigger calculation, which meets real-time expression or periodic calculation.

(4) Secondary development of c# language, docking with multiple data sources, user-defined operators and multiple ways of data storage.

(5) Single node or distributed deployment.

4. Frame structure

The basic schematic diagram of frame structure components is as follows:

[open source address] give up Flink NET5. 0 development csharpflink, brief design, deployment and secondary development instructions. 

5. Code catalog description

Vs2019 is used for engineering development, and the engineering solution file is csharpflink SLN, the code directory is described as follows:

(1) Cache: local cache management of master node and work node computing tasks.

(2) Calculate: calculation task input, process, output operation and management.

(3) Channel: IO communication operation in the distributed deployment mode of master node and work node.

(4) Common: operate the public class library.

(5) Config: global profile operation.

(6) Execution: the execution environment entry of the global project.

(7) Expression: expression evaluation task operation.

(8) Log: log operation and management.

(9) Model: data point metadata information.

(10) Node: master node and work node management.

(11) Protocol: a protocol for the interaction between distributed deployment between master node and work node.

(12) Sink: calculation task calculation result storage interface.

(13) Source: interface with multiple data sources, such as mqtt, Kafka, rabbitmq, database, etc.

(14) Task: window or expression task interface, master node and work node task operation and management.

(15) Window: data window task operation.

(16) Worker: work node interface.

6. Description of configuration file

The default configuration file is: CFG \ global CFG, you can customize the specified configuration file. See the command line operation instructions. The configuration file is described as follows:

 (1) Maxdegreofparallelism: task parallelism. The master node generates tasks and the work node processes tasks, depending on this parameter.

 (2) Masterlistenport: the listening port of the master node, which is used for active connection of work nodes.

 (3) Masterip: master node IP, which is used for active connection of work nodes.

 (4) NodeType: node operation mode, including master, slave and both.

 (5) Remoteinvokeinterval: interval between remote calls to work nodes, unit: Ms.

 (6) Repeatremoteinvokeinterval: the interval between calling the work node again after the failure of calling the work node, unit: Ms.

 (7) Slaveexcetecalculateinterval: the interval between calculation tasks executed by the work node, unit: Ms.

 (8) Maxframelength: the maximum data length transmitted between the master node and the work node, in bytes.

 (9) Workerpower: work node capability coefficient, greater than 1, will send multiple tasks continuously.

7. Task deployment description

For secondary development, see secondary development description. After the developed task passes the test, copy the assembly (. DLL) to the “tasks” directory. For example, after the project testtask project passes the test and compilation, it can be deployed to the “tasks” directory, and the main program of “csharpflink” will be loaded and called automatically.

You can customize the specified task assembly. See: command line operation instructions.

8 command line operation instructions

The command line runs the “csharpflink” program, which supports customizing the specified configuration file or task assembly. The description is as follows:

-H) display command line help.

-C) load the specified configuration file. For example: csharpflink – C: / my cfg

-T) load the task assembly. For example: csharpflink – T C: / mytask dll

For example:

dotnet CSharpFlink.dll -c c:/master.cfg -t c:/mytask.dll

 9. Deployment description

The “release” directory is the compiled program. Copy “csharpflink v1.0” to different paths, modify the “nodeType” parameter in the “CFG \ global. CFG” configuration file as master and slave, modify the number of tasks in the “tasks \ tasks. CFG” file of the master node program, and run “dotnet csharpflink. DLL” in different directories.

For the source code of “testtask. DLL”, see: secondary development instructions.

10. Secondary development description

The secondary development mainly aims at the data source, calculation process and data calculation result storage. The general process is as follows:

(1) For data source docking, you can customize docking mqtt, Kafka, rabbitmq, database, etc. you need to inherit the sourcefunction interface. See randomsourcefunction Class CS.

(2) The data calculation process can be customized for data processing or processing, which needs to inherit calculate For the calculate interface, see: aggregate calculation avg.cs, expression calculation expressioncalculate cs。 Instantiate through the addwindowtask or addressexpressiontask function parameters.

(3) The data calculation results can be stored on any media. You need to inherit the sinkfunction interface. See: sinkfunction Class CS.

11. Application case display

The same computer, CPU: 4-core i5-7400, 3.0GHz, memory: 16g, 1 master node, 5 work nodes, generating 1000 data point tasks, random data point time windows and calculation operators, CPU utilization: 20% – 30%, memory utilization: 30% – 40%, master node CPU and memory usage: 3% – 5%, 100mb-300mb, work node CPU and memory usage: 0.1% – 2%, 25mb-60mb. The operation effect is shown as follows:

[open source address] give up Flink NET5. 0 development csharpflink, brief design, deployment and secondary development instructions.


Internet of things & big data technology QQ group: 54256083

Internet of things & big data cooperation QQ group: 727664080

Website:http://www.ineuos.net

Contact QQ: 504547114

Cooperation wechat: wxzz0151

Official blog:https://www.cnblogs.com/lsjwq

INeuOS industrial Internet operating system official account

Recommended Today

Proper memory alignment in go language

problem type Part1 struct { a bool b int32 c int8 d int64 e byte } Before we start, I want you to calculatePart1What is the total occupancy size? func main() { fmt.Printf(“bool size: %d\n”, unsafe.Sizeof(bool(true))) fmt.Printf(“int32 size: %d\n”, unsafe.Sizeof(int32(0))) fmt.Printf(“int8 size: %d\n”, unsafe.Sizeof(int8(0))) fmt.Printf(“int64 size: %d\n”, unsafe.Sizeof(int64(0))) fmt.Printf(“byte size: %d\n”, unsafe.Sizeof(byte(0))) fmt.Printf(“string size: %d\n”, […]