Development history and future trend of DTM technology and business


Author: Chen Xiaoyong, Ke Gen

A brief history of Alibaba data technology

Taobao was born in Hangzhou in 2003. The next year, Google published three big data papers, introducing computing technology into the era of big data.

In 2004, Doug Cutting and Mike cafarella implemented Hadoop’s HDFS and Mr computing framework based on Google’s paper.

Hadoop project entered Apache community in 2006.

Hive became a sub project of Hadoop in September 2008, and then became a top-level project of Apache. In the same year, Taobao began to move the data computing platform based on Hadoop system – ladder 1.

Alibaba cloud was born in 2009. Alibaba cloud began to write the first line of maxcompute code, and various cloud services began to emerge in China.

In 2014, Alibaba implemented the moon landing plan, completed the data platform migration based on maxcompute platform – ladder 2, aggregated the data business of the whole group to a platform, completed the construction of data public layer, and gradually formed the onedata system and the data platform of the group.

In April 2014, Intel invested in cloudera, giving up its own Hadoop distribution, and cloudera entered the Chinese market in the same year.

In 2017, dataphin, a product of data medium platform, was launched, which supported maxcompute and Hadoop big data platforms. Onedata’s internal technical system began to be enabled externally.

Cloudera and hortonworks announced a merger in 2018, and Hadoop distribution changed from multi vendor competition to oligarchic game.

In 2020, based on dataphin, brand data bank, quick audience and quick stock data, alibaba will start to empower businesses through its own data system, and the data platform will be promoted from pure technology to business value.

The idea of data center emerges as the times require

The traditional data processing methods, especially the traditional data warehouse platform, have high software and hardware procurement costs, operation and maintenance costs, and technical threshold. Only banks, operators and other large enterprises have the ability and financial resources to realize the platform construction of data warehouse and data mart. With the popularity of big data technology and cloud services, the operation and maintenance cost and technology development threshold of enterprises have been greatly reduced, especially the cloud service with high cost performance, simple deployment, nearly unlimited scalability and easy management, and comprehensive use cost and convenience are much better than traditional data platforms. Therefore, enterprises began to migrate their data warehouses from traditional Teradata, Oracle / IBM and other platforms to big data platforms or cloud services. Today, this change is still in the traditional enterprises.

After the rise of cloud computing, database and elastic computing (ECS) are the most common products. However, with the accumulation of users’ data in cloud business, enterprises begin to have a direct demand for data analysis. In 2011, Alibaba cloud’s maxcompute big data platform was launched, and Alibaba cloud entered the era of big data.

With the exponential growth of data, the way and mode of data processing have undergone qualitative changes. The traditional data support mode for management personnel and a small number of business personnel can no longer meet the needs of business development. The disadvantages of long data development cycle, slow response and narrow application range are becoming more and more prominent. Enterprises and governments began to seek ways to respond to market changes and data timely, and put forward higher requirements for data collection, development, use and management.

In order to manage data more effectively and use data more conveniently, enterprises need to transform digital intelligence. The data technology and products Department of Alibaba also realized that the data processing method must be changed to meet the needs of enterprises for data development efficiency, data enabling business to generate value and data guidance for enterprise operation and management. It helped Alibaba group stand out from the fierce competition in the next few years, and continued to help enterprises transition to future competition. Behind this trend war is the competition for commercial dominance.

The essence of data center is to realize data value and data capitalization

Key product introduction:

Dataphin is the data platform construction engine of intelligent data construction and management under Alibaba cloud. Based on the core methodology and technical system precipitated in the practice of data platform, the project aims to provide a full link, one-stop big data capability from data acquisition, construction, management and use, so as to help enterprises create an intelligent data system with unified standards, comprehensive understanding, assets, service and closed-loop self optimization.

The core value of dataphin is to standardize data definition, produce data in a standardized and standardized way, and improve the efficiency of data development.

Data center takes open data for all staff and supporting business data operation as its goal. The design idea of convenient data construction and business value perspective of data platform is the biggest difference from traditional data warehouse. Alibaba carries out data processing and development through the concept that data is for everyone, and primary two is the main user of data, so that front-line employees can have data to see, data to support operational decision-making, and data to do business guidance.

Onedata is a methodology based on years of experience of Alibaba data technology team. Its core is the construction of data public layer. Dataphin is a form of methodology solidified into products. It helps Alibaba economy promote business transformation and realize business value in the process of business transformation. Enterprises can also use these successful experiences and tools to improve data efficiency and support their business and sustainability strategies.

Onedata core is the construction of data common layer. It is through the innovation of underlying services and agile development that Alibaba can endow its huge customer base, provide customers with mature methodology and tools out of the box to help enterprises achieve business innovation. Today, with the creation of business value as the orientation, we can see that data platform can promote the transmission of enterprise data value interest chain.

In the Alibaba economy, hundreds of data applications are applied to various business departments such as Taobao, Tmall, Youku, flying pig, Alipay and so on. Outside the economy, data applications such as business advisor, brand data bank, global consumer operation platform quick audience help external businesses realize business value in Alibaba economy. Data and data tools will more and more realize the connection and cooperation of people, goods and fields.

Under the concept of data platform, in addition to the basic storage capacity and computing resources, data assets also need to build their own data asset management platform according to the organizational structure or development form of the enterprise, so as to gain insight into the health status of enterprise data. There are also asset platforms within Alibaba to provide data health status information, which can provide data basis for system expansion in the next fiscal year. Dataphin built-in data asset management module can reflect the basic status of data assets from the perspective of developers.

In order to manage data more effectively and use data more conveniently, enterprises need to transform digital intelligence. The data technology and products Department of Alibaba also realized that the data processing method must be changed to meet the needs of enterprises for data development efficiency, data enabling business to generate value and data guidance for enterprise operation and management. It helped Alibaba group stand out from the fierce competition in the following years, and continued to help enterprises transition to future competition. Behind this trend war is the competition for commercial dominance.

Application status of data media

1、 General industry data platform construction scenario

Traditional enterprises expect more from business operation and management support. Out of the box tools can achieve efficient data output and data asset management. In the scenario design stage of data platform construction, we will conduct in-depth business research on traditional enterprises, refine business scenarios, and present the business insights that users are most concerned about visually through BI data analysis reports, so as to assist decision makers to make scientific judgments.

Thousands of derived indicators are derived from the business scenario design stage of the data platform. These derived indicators have the characteristics of fine time limit, clear definition of indicators, and many combination conditions between indicators. Dataphin can quickly realize data processing and development, graphical design reduces the threshold of data platform development and design, and quickly achieves the goal of traditional enterprise data modeling and data development from data warehouse planning, data integration, standard modeling, general ide development, operation and maintenance scheduling to data service.

The data assets gathered in the data platform are like a “gold mine”. For enterprises, the data platform must solve the problem of how to manage and use the data. Through centralized data asset management, it is convenient to comprehensively evaluate the use and value of assets, build a full link tracking system of data application, and make data cost and business income clear, transparent and evaluable. Traditional enterprises have formed the situation of data chimney development due to the diversified business system and independent design. Through the horizontal and transparent data management of the whole enterprise, we can make a solid foundation for the overall data management.

Customers of a traditional enterprise, who have a large number of retailers and stores in the country, have high marketing costs. As the business data are in stores and subsystems, it is difficult for the headquarters to find out the reasons. Through the construction of the data platform, after collecting the system data and store marketing data, through the analysis of consumption data, integral accumulation and point consumption data, abnormal behavior members are found. Their consumption in the store is concentrated after 10:00 p.m., which is just the closing state of the store, which is likely caused by the cheating behavior of the wool party. Through the centralized management of data center, the actual activity sales volume of stores under each business division can be supervised. Through the customized “asset visualization portal” of data platform, enterprises can effectively manage their own data assets.

As a traditional enterprise, a telecom operator and an airline have already owned a set of data analysis platform after more than 10 years of data warehouse construction. However, the traditional data warehouse only focuses on data development, without the concept of scenario design and asset management. When there is a new data development task, developers often need to process the source data layer by layer, which is not only time-consuming And there is a phenomenon of unclear definition. These phenomena can be solved by using dataphin and introducing standard data common model.

“Promoting the construction of business and data center is one of the eight tough battles for airlines this year, and it is also a key change in the process of the company’s intelligent transformation. In the past, data that needed to be manually collected from different systems and ran on their computers for dozens of hours can now be easily obtained from the “cloud” in a few minutes, greatly improving the efficiency and quality of analysis work. ” The person in charge of the data center of the airline.

2、 Retail industry wide data mid Taiwan marketing scenario

The new retail industry has a new format sales mode. Businesses promote products through stores, online stores, live platforms, brand App, WeChat / Alipay applets and other channels. In view of the characteristics of multiple marketing forms and channels, Alibaba has launched a global marketing solution, which gathers global data to conduct in-depth insight through AIPL / RFM data model, and improves marketing efficiency and realizes business value through precise delivery. The global marketing solution is based on a series of data products, such as Alibaba business advisor, brand data bank, data construction and management platform dataphin, global consumer operation platform quick audience.

In global marketing, the most important thing is to help users find the target population and bring business value to businesses through crowd prediction model and marketing launch. Therefore, the premise of implementing global marketing prediction technology is to gather the data generated by various formats / channels and process them with Alibaba onedata methodology to realize global digital marketing. The computing power of AI and algorithm platform in this field is also discussed It has direct scene application and business value embodiment. Through the model construction and data output, the business has overall control over business data such as business operation status, member insight, channel and sales management, store management, etc. Through data analysis, decision makers can make business judgment, or provide market forecast for global marketing through predictive marketing model.

Global marketing solution is an important way for enterprises to build data platform and cooperate with Alibaba business ecology to obtain business value. The value data deposited by the enterprise’s data center, Alibaba’s business ecosystem and other media channels jointly build digital marketing, and can return the data released from the outside to form a full link data closed loop.

New retail enterprises such as Feihe dairy, liangpinpu, Jialan and other new retail enterprises use dataphin to manage the data of tmall stores, offline stores, small programs, and their own websites through the construction of global data platform, so as to build unified, standard and high-quality data, support data decision-making and global marketing, and realize business value. As customers say:

“Data center can liberate the data infrastructure, let us have more energy to think about how to use data to solve business pain points and improve the efficiency of the company. In terms of the ability requirements of the organization, we can also be more inclined to the development of business analysis and architecture ability, data model algorithm ability, innovative application product design and planning ability.” Liang pin shop vice president Zhou Shixiong said in an interview.

Zhong Wei, general manager of big data center of Galan group, said in an interview that “we have gold mine (consumer data) in hand, but we lack development methods. The digital technology embodied in data platform is equivalent to new productivity, which can drive enterprises to make breakthroughs in business model and business model by establishing new production relations matching with it, such as organizational upgrading and ecological collaboration. The changes brought about by this breakthrough are DNA level. “.

Looking forward to the future trend of data in China

1、 The trend of real time computing in data center

Data processing is developing towards quasi real-time and real-time. The traditional design of data warehouse is limited to the technical system and cannot realize real-time calculation. The distributed big data technology can not only realize the construction of Pb level data platform (this kind of computing scenario is called data warehouse in History), but also can combine real-time calculation with historical data to realize the integrated development of streaming and batch. It can meet the data timeliness and analysis ability emphasized by the new generation data center.

Alibaba adopts the blink (Flink open source version) real-time computing framework to realize the integration of streaming and batching. Blink has the complex event process capability. It can also provide SQL / table, real-time streaming batch data processing, state event driven application API and other features for developers with different requirements and capabilities to meet the needs of different data development.

The real-time computing technology of data center is not to reengineer the original business process, but to achieve more efficient business analysis through the combination of real-time data flow and data warehouse indicators. Using real-time technology can quickly carry out Bi analysis and business early warning, such as real-time marketing strategy, real-time risk control strategy and real-time anti fraud. These scenarios can be embedded into the actual business system.

Alibaba’s new retail business and double 11 shopping Carnival also use streaming batch to monitor the marketing process in real time.

Dataphin products have been put into research and development of flow batch integration since 2018, and internal flow computing products have been successfully transferred to dataphin products by the end of 2019. In 2020, dataphin released version V2.7, which began to support alicloud’s real-time computing product Flink, and combined with Alibaba cloud’s big data computing service maxcompute, to meet the demand of data timeliness through streaming batch integration technology. Users can realize real-time feedback of marketing effect through dataphin products, and analyze and compare with historical data in the same dimension, so as to provide real-time and accurate data for business personnel to make real-time decisions.

2、 Mobile terminal trend of upper application in data

Bi insight analysis is the most important way of data presentation in Taiwan. At this stage, most of Bi presentation is PC based, supplemented by mobile phone. An inevitable trend of the development of Internet from PC to mobile terminal is that data application will also be mobile terminal. In recent years, in the field of digital analysis, a number of Bi manufacturers have released supporting products for mobile terminal display, but they have not been popularized in the market on a large scale. The reasons are that the screen size is difficult to unify and other objective problems, as well as the situation that the audience scene of mobile terminal is highly personalized. Therefore, the application of mobile terminal in data center must adapt to the requirements of terminal.

In the field of digital Bi, its terminal must consider the end-to-end adaptation, more in the form of digital indicator Kanban, rather than highlighting the rich presentation effect and historical indicators like PC terminal. The second is the combination of terminal app and real-time computing, which emphasizes the analysis ability of real-time data, and the content presented should be timely. It is more applied in the scenario of business flow, real-time order and historical order analysis and prediction.

In addition to app development on IOS and Android, the existing difficulties of mobile terminal also face multiple end presentation problems. Pinning micro applications and wechat applets are other choices for enterprises in addition to app in data Bi terminal. However, from the technical level, pure H5 page development faces large amount of downloaded data and poor use experience, which can not achieve offline data retention and As a result, most mobile terminal applications still adopt app mode.

Due to the high cost of development and operation and maintenance of terminal app and the problem of PV / UV operation efficiency, what kind of data and application mode can improve the frequency of data users is a practical problem for enterprise managers and product managers. However, it is very important for enterprises to analyze the data by combining the data collection of UV + T and PV / T. therefore, it is very important to analyze the data of enterprises by using one hour of data collection and one minute of data collection. Therefore, it is not important to use the data collection frequency of one minute for enterprise data analysis Business value of Bi.

3、 Intelligent development trend of DTS**

The most important value of AI technology is that it can be used in real situations. For example, a typical application scenario of face recognition is to replace password to realize mobile login. After building the data platform, enterprise users can accumulate rich index data, which are the basis of algorithm and AI dependence. In the data, the most common AI application scenarios for Taiwan users are sales volume or flow forecast, recommendation algorithm for thousands of people and thousands of faces, and prediction of marketing activities. These are scenarios that directly assist business decisions.

Under the pressure of fierce market competition, enterprises expect AI computing to help achieve the effect of sales growth or cost reduction in a short period of time. In fact, it is also a great way to improve production efficiency by providing convenience for front-line employees through AI algorithm. Alibaba has such a data product, employees can ask it fuzzy questions, the product directly replies to the index data that employees and users care about, reduces the threshold of data query, and facilitates the use of front-line employees.

“Man follows the earth, the earth follows the heaven, the heaven follows the Tao, and the Tao follows the nature”. Law is the restriction and control. Man takes the earth as the behavior standard, the earth takes the heaven as the norm, the heaven takes the Tao as the norm, and the Tao takes the nature as the norm. Enterprises also rely on data support for their operation. Data support depends on system and system. Data middle platform follows the methodology of data processing and multi terminal presentation. Therefore, data processing is a key to the successful implementation of data middle platform.

Link to original text
This article is the original content of Alibaba cloud and can not be reproduced without permission.