He Changhua, Chief Architect of Ant Golden Clothes: Open-source SQL Flow is the first trial, real-time big data system is the cornerstone of the future.


Open source SQL Flow, feeding back the industry, while showing off the AI muscle.

This is the response given by the industry after Ant Golden Clothes recently opened source first applied SQL to the AI engine project, SQLFlow.

SQL Flow, which combines hard AI with simple SQL, greatly simplifies the threshold for data engineers to use AI technology.

And it was the AI Infra team led by He Changhua, chief architect of Ant Golden Clothing Computing Storage, that developed SQLFlow.

Dr. He Changhua Stanford graduated. He worked at Google Headquarters for seven years, won the company’s highest technology award, and then worked at Unicorn Airbnb for two years, responsible for the application architecture of background systems.

In May 2017, he officially joined Ant Golden Clothes as chief architect of Computing Storage, and in 2018 he was selected as the 14th batch of experts of the National Thousand Persons Program.

In Ant Golden Clothing, He Changhua’s job is to develop a new generation of computing engine and build a financial data intelligence platform.

And SQL Flow is one of the crystals on the main line of computing engine.

But for He Changhua, the world is changing dramatically, and he has to lead a team to explore some things nobody has done.

such asFully Real-Time Large Data Intelligent System

Future technology cornerstone

The concept of big data originated from the search engine industry, because search engines are facing the explosive growth of huge data left by human beings on the Internet.

At the end of 2010, Google announced that a new generation of search engine “caffeine” was officially launched. The revolutionary feature of this technology is that at any time, any web page in the world has changed, and can be added to the index in real time. Users can also search in real time, which solves the delay problem of traditional search engines.

He Changhua was one of the core technical leaders of the caffeine development team at that time.

“The core function of caffeine is real-time,” he explained.

Now, the goal of He Changhua’s work in Ant Golden Clothing is to build a “complete real-time” large data processing system, or called a large data intelligent platform. Because of the diversity and complexity of offline life scenarios, this is a more challenging task than building real-time search.

He believes that this will be the cornerstone of future technology.

For computers, real-time means minimizing the delay between sending requests and returning responses, and for large data processing systems, it also means minimizing the delay from data production to consumption, all of which means improving computing speed and capacity.

MapReduce, a large data computing model commonly used before, is “piecewise” in data processing. There is a concept of boundaries between slices of data. This batch processing mode inevitably brings about delay problems.

Take the search scenario as an example. If the data is batched in days, it means that the updated web pages today will be searched by users tomorrow. The frequency of processing can be increased to solve the problem partially, twice a day, four times a day, two hours a day…

Although it can gradually approach “quasi-real-time”, the cost will also rise sharply.

In order to achieve real-time, we must break the boundary of batch processing, and let the process of data processing, like the flow of water, follow the calculation and feedback at any time.

This also gave birth to the flourishing development of stream computing engine in the future.

In Ho Changhua’s view, in addition to fast, “real-time system” has two important meanings.

The first is the integration of OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing).

In the past, OLTP has a high requirement for real-time, while OLAP has a low requirement for timeliness.

For example, a transaction with Alipay requires immediate queries and additions and deletions, which is handled by OLTP. Data analysis of user behavior characteristics is handled by OLAP.

However, with the changing needs of business scenarios, the timeliness requirements of OLAP are also increasing.

For example, in the wind control scenario of Internet finance, it is necessary to judge the risk by analyzing the user’s characteristic data in a very short time to complete a transaction, which requires OLAP to be able to provide real-time feedback, and the feedback results can be accessed online immediately.

The second is the integration of intelligence and data systems.

Artificial intelligence and machine learning are the hottest fields for big data applications. Now most companies separate warehouses from machine learning platforms, take a batch of data from warehouses and put them on machine learning platforms to train models.

With the complexity and diversification of business scenarios, this model gradually exposes the problem, because whether the model can be updated in real time, whether the model can be trained with more real-time data, directly affects the ability to deal with complex scenarios.

“Data real-time inflow, real-time training model, model real-time online decision-making and feedback data – if this line can be fully connected, it will have inestimable value for business,” He Changhua said.

Data, computing, intelligence, all these constitute the “high-efficiency big data chassis” in He Changhua’s vision, that is, a real-time data integration platform, or “Big Data Base”, just like the data chassis of countless scenarios in the past.

Nowadays, not only Ant Golden Clothes or Alibaba Group, but also more and more data-driven businesses in all walks of life.

But the threshold of big data development is very high, if every business starts from the bottom of data development, it will be very time-consuming and labor-intensive.

How can business people focus more on business?

He Changhua believes that this is the mission of Big Data Base and the meaning of “cornerstone”:

We want to make this simple – practitioners from all walks of life, students from all lines of business, on a solid platform, without knowing the details of the lower level, can easily develop the upper application.

How far is it from real intelligence?

Reducing the threshold of data and intelligence is what He Changhua expects from the new engine and data intelligence platform.

At present, he led the team to develop a financial multi-mode fusion computing engine, which has realized the integration of flow computing and graph computing, flow computing and machine learning. It is closer and closer to the “big integration” he envisaged.

He Changhua revealed that the goal of the team is to make the business “minimalist”:

In the next two to three years, we hope that the new engine will be able to undertake real-time online fusion computing tasks. Based on this engine, combined with other open source engines, we can build a set of data intelligent system. In this data intelligent system, business can easily complete the process from function development to product on-line, and subsequent attraction flow, analysis and decision-making can also be completed by this platform.

He even sketched a very sci-fi future scenario: you write a function to the engine, the engine will decide how many resources to call to calculate, you do not need to care about the specific calculation process, the results will be feedback to you in the shortest possible time.

When you conceive of a new business, the data intelligence platform will determine which data is needed, which model to use, how to go online, and how to operate traffic.

These processes can be intelligently and automatically completed.

This is a longer-term goal. We have developed the ability of data processing. In the future, anyone can use this ability to truly realize “data democratization”.

At present, no company in the world can fully develop such a real-time data intelligent platform, which integrates multiple capabilities.

He Changhua also looks forward to the future with caution and confidence: “We are also exploring, if we fully achieve the exploratory goal, we will truly stand in the leading position in the world.”

A place where there is no one

The world is changing rapidly. As a mirror of the physical world, data is infinite in theory. The question is whether human beings can record and collect them.

The popularity of the Internet and mobile Internet has greatly reduced the cost of human behavior data acquisition.

With the popularity of IoT sensor equipment, data from industrial production and social life can also be precipitated in large quantities.

So in the past two decades, the total amount of data has increased explosively.

While the whole world has undergone tremendous digital changes, our lives are also quietly changing.

Based on the development of data applications, we have enjoyed the inconceivable convenience of e-commerce, O2 O, mobile payment, smart home, etc. a decade or two ago.

But in the eyes of He Changhua, digitization is still in a very elementary stage, moving offline data online.

What we really need to think about is what kind of ability we will have to process and apply massive data when a highly data-based society arrives in the future.

This is related to whether we can do more things based on data, which will lead to higher intelligence and further promote the development of human society towards the next stage.

That’s the answer he’s looking for when he returns home to join the Ant Golden Clothes.

The reason why I came back is that I think what I have done here is, in a larger sense, an exploration for the next stage of the development of human society.

In this new exploration, dealing with massive amounts of data is a compulsory course, so he repeatedly emphasizes the importance of computing power: big data, artificial intelligence, in-depth learning… There is no need for strong computing power, otherwise, it is difficult to move forward.

The development trend of AI is also to simulate human capabilities with larger, higher and more massive computations.

“Real AI = data + 100 times computing,” Google’s latest AI model level, translated into hundreds of GPUs for a whole year.

He Changhua and his team devoted themselves to developing a new generation of computing engine and data intelligence platform, which is actually a comprehensive carrier of efficient computing and powerful data processing capabilities.

It was born from the vast business scenarios and data of Ant Golden Clothing. Its original intention is to support the various business of Ant Golden Clothing. However, as technology matures, it can also have the versatility of multiple scenarios.

Financial attributes bring about high availability and security, so it can be widely used in other industries, even more indifferent to life service scenarios.

The significance of this work, to say the most, is to promote social change. Although it sounds like a grand proposition, it is not so high.

“Every technology must have its foothold. Specifically, these technologies are closely linked to the daily lives of hundreds of millions of people.

Every day, when He Changhua takes out his cell phone and uses Alipay checkout payment, he can intuitively feel his work results. Just like when he works at Google, he also uses search function every day: “The results he makes, he uses them every day, and he really feels the change of technology in his life.”

He stated his ideal of life in this way. In the journey to his ideal, he is at the forefront of technology and in the most everyday scenarios, which are inseparable:

Use technology to improve people’s lives and promote the continuous evolution of society and people.

Author: Chestnut

