At present, the Internet technology has already entered the stage of big data and artificial intelligence. Various computing models and computing engines emerge in endlessly. From MapReduce distributed computing 10 years ago, there will be a new computing engine change almost every three years, from Hadoop, the founder of the mountain, to storm, the later stream computing, and then spark, which is based on memory iterative computing, has been in the limelight for a while, and now Flink is coming into everyone’s field of vision in a lightning fast manner. At the same time, not only data computing, but also machine learning frameworks have emerged in endlessly in recent years: keras, pytorch, caffe2, tensorflow It can be said that the rise of every computing engine means a new breakthrough in computing technology.
With the development of computing engine, today’s Internet business services rely more and more on big data and artificial intelligence technology. From the initial data warehouse analysis business, it has gradually evolved into intelligent decision-making services with stronger real-time and higher complexity.
The combination of the two brings more possibilities for business innovation, but the technological phenomenon behind it is that the technical system is becoming more and more huge and complex. The alternative use of various computing engines has brought a sharp rise in learning costs. At the same time, the increase of research and development costs and the reduction of efficiency are also common problems. Worse than the reduction of R & D efficiency is that the data between different computing engines cannot be shared and connected. Most scenarios need to use intermediate storage for dump, resulting in the waste of storage resources and the doubling of data volume. In addition, such as the lack of unified metadata, the existence of data security and other hidden dangers always threaten the increasingly large and bloated system system.
In the face of these problems that can not be ignored, ant financial service put forward the concept of open computing architecture in 2018, hoping to solve the problems of computing engine update, unified R & D system, data sharing and interworking, data risk prevention and control, etc. by designing a set of technical framework that conforms to the current computing system and can cope with the future computing trend at the same time.
First of all, in the aspect of computing engine, the open computing framework believes that the computing engine is always continuously updated, so a set of unified metadata and state management is needed, aiming at the unified management of different computing job states, to achieve compatibility with any kind of computing engine, and to achieve plug-in capability; in addition, in the aspect of research and development, different computing engines have different research and development modes and APIs Interface, in order to unify the R & D capabilities of various engines, it is necessary to encapsulate the computing DSL at the top level. For this reason, we have launched smartsql, expanded some functions and syntax over the standard SQL specification, and hoped to describe most of the computing and machine learning tasks in the simplest and common language. In addition, in order to solve the problem that multiple engine data cannot be accessed and shared, Mayi financial has independently developed a unified storage system to support multiple types of data storage formats. At the same time It supports the automatic conversion and migration of data between different formats, greatly simplifies the use of storage in the engine layer, and saves a lot of costs.
Not only traditional computing and storage forms, but also a large number of relational data in current Internet services. At the same time, scenarios such as social networking, risk control, anti money laundering, marketing and so on all have a large demand for relational computing. Therefore, as a new data form, graph computing is also the focus of ant financial. In the open computing architecture, it includes the graph computing engine and storage integrated off / online. From the usage scenarios, it can cover online, real-time and offline scenarios to support different timeliness businesses. In terms of function, it has the computing ability of financial level graph database, super large scale graph calculation, dynamic graph calculation of flow graph hybrid and super fast memory graph, covering the data computing ability of different levels. In addition, like other computing engines, ant financial also developed graph SQL based on SQL rules as a unified graph query language to cover all graph computing engines.
In machine learning, open computing architecture also includes sqlflow, which has been open source before, and elastic DL elastic deep learning framework just launched. As a bridge between data and training, sqlflow extends machine learning syntax on the basis of standard SQL, so that data analysts can train their own data model like writing SQL. In addition, sqlflow supports most of the machine learning engines on the market, as well as elastic DL elastic framework. Elasticdl is an elastic deep machine learning framework based on k8s system, which is compatible with tensorflow engine and keras syntax. It can reduce training waiting time and running time through elastic scheduling.
This whole system, also known as big data base, is the best practice of financial data intelligence explored by ant financial, which is the cornerstone of the next generation of big data.
On the third day of the cloud habitat conference in Hangzhou on September 27, ant financial will share the practice of financial data under the open system, including the technical details of ant open computing architecture in the digital financial technology special field. Welcome to pay attention at that time.
Read the original text
This is the original content of yunqi community, which can not be reproduced without permission.