Recently Alibaba shared the PPT of Alibaba data in Taiwan practice (searching original articles by itself). We still need to learn from the founder of data in Taiwan with great respect. So we carefully read it and hope to find something different.
Reading these professional ppts is actually very time-consuming. You need to peel off the bright appearance of these ppts and dig every word on them to understand the hidden meaning below, and then compare them with your existing knowledge system to see if they can help improve your cognition. For those you don’t understand, you need to often search relevant documents.
Of course, many of the words used to write the PPT are not so rigorous, many of the concepts are made temporarily, or unique statements, so sometimes we have to make some speculation, combined with our own practice to understand, the interpretation of this ppt has more than 6000 words, so please be prepared to burn your brain, although the author did not go to the scene to listen to the speech, but I hope that my “speech” can also let you learn real kung fu.
1. Title and background
Seeing the origin of this movie, Alibaba cloud’s intelligent business unit is actually a little strange. Remember that Alibaba’s business group in Zhongtai includes search business unit, sharing business platform, data technology and product department. Alibaba cloud is a platform business unit focusing on cloud business. Is it suitable for data in Zhongtai?
Some people will ask, what’s the difference between platform and Zhongtai? Isn’t it appropriate for alicloud to talk about China and Taiwan?
The author’s doubts are as follows: in a general sense, the platform has business independence, so it’s OK to concentrate on technology, while Zhongtai is the convergence of business, which has a great correlation with business. For data Zhongtai, its core competitiveness is not platform level technology, but data understanding, processing and mining. It is unrealistic for a platform technology person to go to the front to understand the common characteristics of data demands, which is the core of creating value in the current data.
Of course speaking of PPT, you can understand Alibaba’s data without asking about your background.
2. DT left, it right
Traditional it is a cost center, and with data, it can become a value center. This value is reflected in: decision support can be provided in management, intelligent tools can be provided in production to match management, that is, the adaptability of improving production relations and productivity.
This is a good point, for example, Zhejiang Mobile big data center is directly positioned as a profit center.
The comparison between it and DT here is not very appropriate. They are not the opposite relationship, but the fusion relationship. D forms DT through it. For example, the original it channel system only accepts business, and now it can load intelligent recommendation based on data in the acceptance scenario.
DT is just an abstract concept that highlights the value of data, which can’t be understood roughly. Now China Mobile has put forward a three fusion concept: fusion, fusion, and fusion. I think it and DT need to strengthen fusion and fusion. Fusion is to sell together, and fusion is to share the ability. It has DT ability, and DT also has it ability.
DT mentioned in the film is problem-oriented, it is demand-oriented, which is the two sides of a problem, not the difference between DT and it; the new DT is given to fish, the difference between it and net is that the point of view of method is reasonable, for example, DT’s Intelligent Recommendation provides a method, while the previous it recommendation relies on human judgment.
3. Enterprise organization’s hope for DT
Executive team: looking at indicators and finding risks is the basic demand of the Bi era. There is nothing to say about it. The more powerful processing, visualization, real-time and other technologies of big data can provide a better data viewing experience, which is the place where Bi has improved compared with the previous ones.
Business team: mention three changes:
One is to find problems through data, rather than slapping the head.
Second, business personnel should understand both business and data, and even DIY data and models themselves.
Third, the data should be embedded in the production process to play a direct role, such as the tag library to become the initiator of marketing target users, the risk control model to be embedded in the user operation process and so on.
The first point is that we are all doing it. In fact, it is still based on experience. The data is only for reference and evidence. This model has not changed in essence. Second, the third point is difficult for most enterprises.
Technical team: three key points are mentioned:
One is that “data multi run” is the core of the intelligent platform. Zhejiang’s “one run at most” is to achieve this goal by integrating data and platform.
Second, it personnel should have a data-based thinking, which is very good. People who lack data thinking seldom consider intelligence when designing it systems. Now many enterprises’ acceptance systems and recommendation systems have this reason.
Third, it is the mission of the data technology team to discover new knowledge through data analysis so as to enable the business.
4. Large, medium and small front desk
This picture explains the engine of Alibaba’s business operating system: large, medium and small front desk. It is very clear, especially reminds us to understand two important concepts: business data and data business.
Business data: all business activities should record the relevant data, which is the mission of business in Taiwan.
In fact, the challenge of business data is very big. In the past, the business platform was designed with functions and processes as the core, only recording the data necessary for the realization of functions and processes, and other data is dispensable.
For example, some signaling log records of operators are not comprehensive, which may affect the subsequent network analysis or data value realization, which does not achieve business data.
However, business data sometimes means huge cost input, which is easy to implement. Most enterprises’ data is not the result of business data strategy implementation, but just the low fruit picked by the way.
One of the mission of data team is business data. Many good data are won by you when you enter the front end, so as to drive business record data.
Data commercialization: the essence is to find value from data, which in turn enables business, which is well understood.
The word “digital twin” is also popular now. In the future, the world of everything connected will record all your behaviors in real time, and form another digital you. This is digital twin. If the business center is you, the data center is your brother.
5. Four typical scenarios of enabling in data
(1) global data monitoring: the essence is indicators + Reports + visualization, which is for managers to see. Of course, business personnel also need to see. The following is a double 11 large screen example.
(2) data operation – Intelligent CRM: when it comes to “building a data connection extraction management system with” human “as the core based on all link and all channel data, and fine management of users in the whole life cycle”, so many adjectives are muddled, what are they talking about?
The whole link is to record and track the data of the whole business process vertically (including commodity planning, pre-sale and in sale management, customer service management, order processing, warehousing and logistics, etc.).
Omni channel is the user behavior data of each contact, such as tmall, Taobao, Youku, etc.
Therefore, a complete customer profile can be formed by aggregating all link and all channel data, and then the data needed can be easily obtained by connection extraction for analysis, which is similar to our label library positioning in literal sense.
(3) data implanting business – Intelligent Recommendation: it is clear here that it is marketing closed-loop management, from user segmentation, thousands of people, channel recommendation, to marketing evaluation. The following is an example.
(4) data commercialization – Business Consultant: This is a few pure data products of Alibaba’s pedigree. It is a typical representative of data commercialization and data direct realization. It can provide end-to-end analysis support for the shopkeeper. There are many online introductions. The following film focuses on the history of business consultant. Now and in the future, it is interesting.
History: a hundred schools of thought contend. Although data redundancy, poor experience and other issues are raised, there is no one hundred schools of thought contend. It is impossible to have the integrated product of business consultant.
Now: business advisors dominate the world, relying on the data middle platform system, including onedata, oneservice, oneplatform, etc., which will be interpreted later.
Future: a business consultant is not enough. We need to build a product development platform and copy a business consultant for different industries, that is, staff x, with great ambition.
Why pure blood?
Because data is attached to business process, such as recommendation. When you evaluate data value, it’s hard to say whether it’s good business itself, good process design or good data recommendation. Pure data product is a better way for data personnel to show their value.
6. The origin of Alibaba as a data center
The origin of data middle platform is the same as the general data warehouse fusion model, and the need of sharing and reusing. For example, all kinds of businesses based on Taobao data have built a set of middle layers, many of which are repetitive or similar. For example, ant business has a transaction theme, tmall has a transaction theme. Can we abstract the public transaction theme to serve both businesses ?
Therefore, you can see that Alibaba data middle platform abstracts the member, commodity, transaction, browsing, advertising and other public core theme layers, so as to serve the application layer. Before, there were many public layers in each application layer, and now they can be reused completely, which can improve the speed of application construction in theory.
The following page compares the changes before and after from the data dependency graph. One is mesh, which represents the inextricable relationship between each other. There must be a lot of redundancy, one is radial, and one node can serve more back-end nodes, which represents sharing and simplicity.
7. Panorama of Alibaba data
By reading this picture, you can understand what Alibaba’s data middle platform has done. There are five parts that are directly related to the data middle platform: Data middle platform DAAS, data asset management ipaas, data research and development platform ipaas, and computing and storage platform IAAs.
The author understands that the broad sense data middle platform actually includes three parts: Data middle platform DAAS, data asset management ipaas, and data R & D platform ipaas. If the narrow sense understanding only includes data middle platform DAAS. Data asset management ipaas and data R & D platform ipaas are called energy middle platform in the author’s enterprise.
(1) computing and storage platform IAAs
Stream compute: it should refer to the Flink version of Alibaba.
Offline computing maxcompute: the EB level data warehouse (original ODPs) developed by Alibaba.
Real time computing ads: short for analyticdb. It mainly provides real-time online analysis, which can be considered as the OLAP version developed by Alibaba.
(2) data asset management ipaas
Data asset management is actually the same as metadata management.
Asset map: in essence, it is a graphical version of the data dictionary. You can find the answer to how much data Ali has, how to store, how to find and how to use the data. It’s quite vivid. From the perspective of online data, its design is still worth learning. Next are some interface screenshots.
Asset analysis: you can understand the Bi analysis for metadata, the structure analysis, the trend analysis and so on. You want to understand the status quo and find exceptions through metadata analysis, so as to guide the governance of data assets, such as how the data growth of payment category is.
Asset application: you can understand that using metadata information to improve the utilization efficiency of data assets, such as mining invalid data assets through impact analysis, so as to reduce data redundancy. This work is done well and has great value.
Asset operation: the word operation is used badly. Operation is not a function, but an action. We hope that data can be used by more people through various measures to generate more value, such as recommendation of new data assets.
The two eight law of data assets use is very obvious. Most of the data is actually not accessed or used, and the cost of storage is very high. Only through operation can silent data be used by more people, and invalid data be cleared, so as to achieve cost reduction and efficiency increase.
(3) data development platform ipaas
This platform is a thing of DACP mentioned in the previous article. It is responsible for data processing and needs a series of supporting functions, including data planning, exchange, processing, development, scheduling and monitoring.
(4) data center Daas
Oneclick: it is the ETL in the traditional data architecture, which collects data from various channels offline and in real time.
Onedata: the purpose of data warehouse modeling is to ensure the standardization and unification of data caliber, and to precipitate common data. Alibaba adopts dimension modeling, abstracts dimensions and indicators through business process analysis, and finally summarizes the required warehouse model.
Extraction data center (oneid): the author understands that in order to facilitate the provision of data to the outside world, Alibaba has formed a set of wide tables with various IDS (business core objects) as the only identification, just like operators need to form a set of wide table system with user ID (mobile phone number), customer ID, account ID and family ID as the core.
Unified data service middleware (oneservice): take data warehouse integrated and calculated data as data source, and provide data services through interfaces.
8. Precipitation and accumulation of Alibaba data in Taiwan
Data standardization: to achieve the unified specification of all fields, topics, models, fields, and indicators of data assets, the author has always stressed that data standardization must be solved at the source. If Alibaba’s business system data assets follow this principle, it is very powerful.
Technology core tools: my understanding is that the standard landing must rely on tools to enforce control, for example, you can only build tables according to the requirements of the standard template, otherwise it will not be implemented. Ali’s control in this area is said to be more powerful.
Metadata driven Intelligence: with metadata analysis, we can scientifically calculate the demands for resources, and can do it very quickly and flexibly. We can get rid of the dilemma of finding basis everywhere for each planning and expansion, which is similar to the previous metadata application.
Onedata is the core content of Alibaba data. It has a dataphin engine, which can realize the functions of data standard definition, automatic development of data model, real-time generation of thematic data service, etc.
As shown in the following film, it includes data introduction – specification definition – Data Modeling – data external Association – Data Asset precipitation – data service generation. Through this chain, most elements of data management are realized.
To some extent, this highly standardized development mode also reduces the flexibility, but its scale benefit is very good. Otherwise, Alibaba’s huge data assets cannot be well managed at all. The author deeply understands that, like the DACP we operate, they must also encounter what we encounter.
Index standardization is something the author has tried, because at the beginning, I felt that there were too many reports that were developed repeatedly, and this kind of problem can be solved through index standardization. This is the idea that comes naturally after the report has achieved a certain degree. The following Ali’s approach is the same as what he did at the beginning, the so-called same way.
Suppose there is a user Zhang San, who uses Baidu map on the first mobile phone, watches Baidu iqiyi video on the iPad, uses Baidu app on the second mobile phone, and uses Baidu search on the PC computer. How can the same user’s information on these different ends be aggregated?
Different from the natural mobile phone number identification of operators, the challenge for Internet companies to get through all kinds of account IDS is very high. Id-mapping is a core technology of Internet companies, which needs to ensure that the data collected in various fields can be integrated and associated analysis, without the support of a unified ID, it is meaningless for diversified data to be centralized analysis Another form of data island.
For example, the following four user records actually indicate the same person.
“Data asset analysis” and “data consanguinity tracking” have been mentioned in the previous “data asset management ipaas”, which are very basic things in data management, especially comprehensive data governance.
Security: refers to definitions such as sensitive data classification and access control.
Quality: refers to the definition of data quality rules.
Cost: it refers to a comprehensive evaluation based on the data asset call and processing cost.
Personnel: maybe data assets refer to the definitions of the organizations and individuals to which they belong. For example, there is an attribute in our data dictionary. The creator and modifier of this asset must be identified so as to track and trace the responsibility.
Thematic data service: it should be a simple data service query engine based on metadata. It should be oriented to business, unify data export and data query logic, and shield multiple data sources and physical tables. It is to develop a set of business oriented pseudo SQL for easy access to data.
Unified and diversified services: general query refers to general SQL query, OLAP is multi-dimensional analysis, and online service is more abstract. I guess it is a customized service form such as data push, timing task, etc.
Cross source data service: because there are many technical components of big data, different data are often stored in different databases, such as Hadoop, gbase, Oracle, etc. if you want to perform ad-hoc query across heterogeneous databases, you generally need to do data aggregation first, but some lightweight data retrieval wants to be able to directly perform association analysis to get results, so this service demand appears.
Ppt has been interpreted here. The author’s biggest feeling is that Alibaba’s data middle platform technology system is huge, but it pays great attention to the details. It looks simple in a few words, but the landing needs to pay a huge price, and it is a gradual process, such as dataphin. To learn more about the technical details of Alibaba data, we recommend Alibaba big data practice.
In fact, if we want to do well in data center, we can not simply introduce a few tools. Technology is just technology. You can copy technology, but copy can not manage and culture, which is the key to the success of data center.
The greater challenge of data in Taiwan is: whether your enterprise’s understanding of data has reached a certain stage, whether you can drive the company to establish a set of data management mechanism and process suitable for your enterprise, which is the most difficult, you have to go out your own way.
Author: Fu Yiping
Read the original text
This is the original content of yunqi community, which can not be reproduced without permission.