Chen Lei, datapipeline partner & CPO
He once served as chief data scientist and senior consultant manager of cognitive Internet of things Laboratory Service Department of IBM Greater China. Ten years of management experience and fifteen years of experience in data science and finance. Director of industrial innovation Department of National Engineering Laboratory of integrated transportation big data application technology, director of big data intelligent innovation center of Software School of Xi’an Jiaotong University, and member of blockchain special committee of China Electronics Society.
Last week, we released “the way of real-time data fusion, broad view and appointment, value driven”. The article mentioned that real-time data should not only fully cover all kinds of data that can be used, but also distinguish the order based on value; We should not only efficiently release the value of data, but also select the starting point and entry point. The touch business is undoubtedly the best entry point for real-time data because it is directly related to revenue. Because of its importance and sensitivity, people pay more attention to the stability and high error tolerance in the process of real-time data fusion.
Whether it’s Jedi survival or enterprise system, stable output is the most important.
When the upstream and downstream are unstable, you have to be stable
In the process of acquiring and loading real-time data, the upstream and downstream nodes are generally registered rather than managed. It is difficult for the system to perceive the actual status and problems of the upstream and downstream nodes in real time. In the actual enterprise environment, the upstream and downstream nodes of real-time data fusion are often higher than the real-time data fusion system in terms of business continuity and service level. Therefore, The processing of real-time data needs to follow the management mechanism of upstream and downstream nodes, such as authentication mode, security encryption mode, connection duration, maximum number of connections, and even log mode. Not to mention that there can be more than one type of upstream and downstream nodes. In addition to fully investigating the preparation and relying on the management mechanism, Real time data fusion needs to have sufficient policy configuration and fault-tolerant mechanism to deal with the uncertainty caused by the instability of upstream and downstream systems, so as to ensure its own stability.
When the structure is unstable, you have to be stable
After we stabilize the upstream and downstream nodes, you need to consider the instability of the internal objects of the nodes, that is, the so-called DDL problem. The reason is the same as above. In some enterprises with high information level, any data structure adjustment needs to first conduct impact analysis on the data control platform, notify all downstream systems to go online and switch after joint commissioning and testing, But after all, this is someone else’s child. When you get to your own home, there are all kinds of reasons. The structural changes of the upstream system occur when you don’t expect them, while the downstream systems are crying for food. It’s the latter thing to assign responsibilities and divide the pot. First, we must ensure that the business can’t stop. Therefore, it is required that real-time data processing needs to be able to provide perfect structural change response strategies, and the perception methods of structural changes are different for different data node types and incremental acquisition mechanisms, Some are simple and efficient, and some are very expensive, which requires real-time data fusion to be able to choose and configure according to different scenarios, so as to ensure its own stability.
When the flow is unstable, you have to be stable
At the beginning of processing real-time data, we often link the real-time data with transaction data, behavior data and other time series data. In the actual enterprise environment, some systems are often updated in a large area in some cases, and the upstream increment will suddenly increase. Usually, it is as quiet as a mountain stream. When it turns around, it becomes the Yellow River in Hukou, Therefore, the flow of real-time data is often related to the design of upstream application system, data model and data management mechanism, and can not be evaluated only based on transaction volume. In addition to accurate capacity evaluation and resource preparation, resource utilization and cost need to be considered. Therefore, real-time data fusion is required to have strong back pressure processing mechanism and flexible reading The write limit configuration can realize the processing of incremental data back pressure by controlling the reading rate, parallelism and batch size, so as to ensure its own stability.
When the environment is unstable, you have to be stable
Generally speaking, the stability of the network, storage and computing devices in the enterprise environment can be guaranteed, but I can guarantee that just like every balding programmer has so many overtime nights, every operation and maintenance engineer can tell a few supernatural stories about the inexplicable problems and recovery of the system, Therefore, real-time data processing is required to provide preset strategies to reconnect, reset threads and even restart tasks in the case of unplanned network unavailability and unknown exceptions, so as to ensure its own stability.
——It’s so stable, but with so many configurations, how long will it take?
——Time? There is no time. When did we have time? So you look down.
In the next issue, we will discuss in detail from four aspects: convenient configuration, convenient deployment, hierarchical management and on-demand service“The method of real-time data fusion is convenient and manageable”, please keep your attention!