“3 + 3” see how Huawei cloud fusioninsight leads the sustainable development of “new data infrastructure”


Summary:A unified modern data infrastructure requires three types of architectures to practice three different application scenarios.

Recently, a16z, a venture capital organization of a well-known technology enterprise in the United States, summarized a set of general technical architecture services, which are divided into the following three scenarios.

1、 Panorama of data infrastructure

The data flow shows that the data source on the left uniformly gathers the data into the data lake or data warehouse through data processing (batch, real-time flow, event flow, etc.), carries out AI analysis through data science or machine learning, and provides agile result data for customers or app through ad hoc and real-time analysis. Huawei cloud fusioninsight provides government and enterprise customers with a one-stop cloud native data lake of the whole scene, and provides leading overall solutions around the whole data life cycle of government and enterprise customers, including Mrs big data, DWS data warehouse, one-stop data management center and other cloud services, to help government and enterprise customers release massive data value! Its architecture is as follows:

Thousands of large customers have built their big data application platform based on fusioninsight architecture.

A16z after investigation, relevant industry insiders have come to a conclusion: a unified and modern data infrastructure needs three types of architectures to practice three different application scenarios.

  1. Modern Bi architecture
  2. Multimodal data processing architecture
  3. Artificial intelligence and machine learning architecture

1. Modern Bi architecture

This is the default option for small data teams and enterprises with limited budgets. Enterprises gradually migrate from the traditional data warehouse to this architecture, taking advantage of the flexibility and scalability of the cloud.

Application scenarios include reports, dashboards, and self-service analysis. SQL is mainly used to analyze structured data.

  • Advantages:The initial investment is low, the start is fast, and the talent reserve on the market is sufficient.
  • Defects:It is not applicable to teams with complex data scenarios, such as data science, machine learning, or real-time scenarios.

The data faced by traditional small data teams are basically RDBMS structured data of the core trading system. The amount of data is GB. Ordinary data warehouses can support their data analysis and mining without big data support. This processing method is also common in large enterprises in 2008, The amount of data is small. Basically, the daily data screen and data analysis can be completed by using the data warehouse. This is the default option for small data teams and enterprises with limited budgets. Taking advantage of the flexibility and scalability of the cloud, enterprises gradually migrate from the traditional data warehouse to this architecture.

Huawei cloud fusioninsight can provide enterprise level data warehouse. DWS currently serves more than 1000 large customers worldwide and is widely used in government, finance, operators, large enterprises and other fields. The product began in 2011, has experienced nearly 10 years of technology accumulation, and has obtained 180 + authorized patents at home and abroad. DWS can support daily structured data analysis, which has the following characteristics:

  1. large-scale:Gaussdb (DWS) is based on distributed architecture. On the basis of ensuring acid, it breaks through the technologies of multi streams and multi groups in large-scale distributed scenarios, and has the ability to expand 2048 nodes. It is worth mentioning that logical clusters can organically unify different business loads such as ODS, data warehouse, data mart and self-service analysis, and effectively isolate and share them.
  2. High performance:Gaussdb (DWS) has a multi-level fully parallel computing engine. It supports parallel computing of multiple physical nodes. Within a physical node, it supports parallel computing of multiple CPU cores. In the instruction sequence of a CPU core, it supports SIMD instructions to realize that one instruction operates multiple data at the same time. Give full play to the parallel ability and provide the ultimate performance for the business. In addition, multi cores technology improves the performance of Kunpeng by more than 30% compared with the x86 chip of the same generation.
  3. High reliability:Gaussdb (DWS) has multi-level disaster recovery capability and can handle smoothly in case of software and hardware exceptions in AZ, cluster, node and process. In addition, the server-side multi retries technology greatly reduces the business perception of failure; Secondly, gaussdb (DWS) also has a perfect detection and processing mechanism on the thorny sub-health problems; Finally, offline capacity expansion, semi online capacity expansion and online capacity expansion technologies can calmly meet the different capacity expansion needs of customers.

The core of Huawei cloud DWS data warehouse technology is the distributed architecture. In the past decade, Huawei has been building competitiveness around distributed architecture. In the future, gaussdb (DWS) data warehouse will continue to evolve based on distributed architecture and build the next generation of open and full scene analysis database around cloud, big data, 5g / IOT and artificial intelligence.

2. Multi mode data processing architecture

This architecture is usually used by large enterprises and technology companies to meet complex data demand scenarios.

Application scenarios include: Bi and advanced functions, including AI / ml, low latency analysis, large-scale data conversion, multi type data processing (text, image and video), and various languages (Java / Scala, Python and SQL)

  • Advantages:It can flexibly support various applications, tools, UDF and deployment environment. Cost advantage on large data sets.
  • Defects:It is not suitable for small data teams. Maintaining this architecture requires more time, cost and expert resources.

In the real world, while the demand side is awakening, the evolution of technology has been continuing in the parallel cyber world. Since the release of Apache Hadoop architecture in 2006, by 2011, enterprises have gradually adopted open source or commercial big data software evolved from Hadoop architecture, opening the era of offline computing; In 2012, streaming computing with spark as the core opened the era of real-time computing, and the scenarios of online analysis and real-time computing began to be gradually applied, but the users at this stage are mainly developers; Since 2013, with the surge of data, the big data platform has evolved into an integrated big data platform. With the rapid development of AI and other technologies, the big data platform has evolved towards intelligence from data analysis to data mining.

IDC, an authoritative research institution, said, “competition in the digital age is accelerating. Market participants either become leading enterprises through digital transformation to form large-scale advantages, or will be gradually eliminated by the market”. With the rapid development of 5g, AI, IOT and other technologies, by 2025, the global data volume will increase rapidly from 33zb in 2018 to 180zb, and the total global digital economy will reach 25 trillion. CEOs pay more and more attention to it, with participation as high as 67%. Digital technology makes the return on investment reach 6.7 times, and the digitization process of government and enterprises is 64%. To sum up, digital transformation is the only way for government and enterprises to fully release the demand for data in complex scenarios.

Digital base is so critical that big data, as the main bearing technology, is naturally the top priority. Huawei cloud fusioninsight provides MRS data Lake service to enable government and enterprise customers to continue to evolve under a large, fast, financial and stable cloud native data Lake architecture:

1) Large:Support large-scale clusters with up to 20000 + nodes, and the cluster federation can be expanded indefinitely;

2) Fast:T + 0 real-time incremental update synchronization, millisecond efficient real-time OLAP, shortening the analysis link and realizing real-time data Lake;

3) Financial:Hetuengine breaks the restrictions of multi engine, multi-source and cross region, eliminates data islands, unifies SQL interface fusion analysis, simplifies data consumption and improves Bi;

4) Stable:Support online rolling upgrade without dismantling clusters and moving applications, so that customers can continuously evolve an architecture and have no worries for ten years!

5) Cloud native data Lake:Make the data globally visible through unified metadata; Reduce TCO through enterprise level EC separated from storage and calculation.

Huawei cloud big data has been put into research since 2008 and launched commercial products as early as 2014. Adhering to the open source and open mind and practicing the “platform + ecology” strategy, Huawei cloud has stepped on the historical process. Around the whole life cycle of government and enterprise big data, Huawei cloud fusioninsight is a cloud native intelligent data lake with leading technology, It is the solid data base of Huawei cloud three ambassadors’ data enabling scheme.

  1. Artificial intelligence and machine learning architecture

Companies applying machine learning are already using some of the technologies of this architecture. Enterprises that deeply use machine learning will deploy a complete set of architectures and even develop new tools.

  • Scenario:Data driven internal and external applications, scenarios have real-time or batch processing.
  • Advantages:Fully control the overall development process and build machine learning as the core and long-term ability of the enterprise.
  • Defects:It is not suitable for a small range of internal application scenarios that are still exploring machine learning. Large scale application machine learning is still the biggest data challenge

Huawei cloud modelarts provides a one-stop AI training and reasoning platform for government and enterprise customers. It has the following characteristics:

  • Support AI full stack, full process and full scene development training
  • Support unified resource management and unified pool scheduling
  • Support mainstream engines and self-developed engines in the industry to realize zero cost migration
  • Provide multi-dimensional functional features to meet various users

In the recently released IDC marketscape: evaluation of Chinese big data management platform manufacturers, 2020 (hereinafter referred to as IDC big data report), China’s mainstream big data manufacturers were comprehensively evaluated from the three dimensions of capability, strategy and market share. Hua Weiyun was in the leading position in the leader quadrant, In addition to taking the lead in both technical strength and market share and continuously innovating in technology, Huawei cloud fusioninsight is also the “most knowledgeable” big data solution:

In the field of government, 50% of smart cities in China have the tireless figure of Huawei cloud fusioninsight. Huawei cloud big data has supported many ministries, commissions, provinces, cities and regions to build “big data + government”. In a city, Huawei cloud fusioninsight and its partners have built “one cloud, two networks and three platforms”. Under the guidance of a unified government data logic model, aiming at the “pain points” and “difficulties” of people’s livelihood, industry and government, starting from the construction of urban data resource database, big data analysis supports government intelligent decision-making, focusing on the construction of intelligent applications, breaking the information island, Realize the exchange and sharing of information resources, give full play to the power of big data in the three aspects of “digital gathering for the benefit of the people”, “digital gathering for business development” and “digital gathering for good governance”, support “one number, one window and one network” government services, and make it a reality to simplify administration and delegate power and “run business once at most”.

In the financial field, 50% of China’s top 20 financial customers (including banks, securities, insurance, etc.) have used Huawei cloud fusioninsight to build their big data platform. A bank used Huawei cloud fusioninsight to build a big data infrastructure platform, supported the construction of “one lake and two libraries” with the bank’s enterprise level data lake, data warehouse and group information database as the core, carried the business systems of the head office and all branches, supported daily bank Bi, AI, data mining and data analysis, realized a global availability of data and avoided data relocation, Improve collaboration efficiency by 10 times, storage cycle by 2 times, resource utilization up to 90%, and accelerate the process of bank digital transformation.

In the field of operators, the three major domestic operators use Huawei cloud fusioninsight to build their big data platform. Based on Huawei cloud fusioninsight, Guangdong Mobile cooperates with government and enterprise customers to build a series of benchmarking applications such as smart grid, smart transportation, smart port and HD video, open up the data life cycle link, realize internal business support and external application empowerment, and comprehensively support all kinds of big data application services such as government affairs and people’s livelihood.

In the field of transportation, Shenzhen Metro adopts Huawei cloud fusioninsight to build a big data analysis platform for line 6 and line 10, create a leading 5g + big data scheme, and build its data asset center and operation monitoring center to carry all metro business systems and support equipment health, energy consumption management, passenger flow statistics, line center level monitoring For line level data analysis such as emergency decision-making and image fire analysis, the data analysis efficiency is shortened from week level to minute level, so as to realize efficient operation and promote Shenzhen into a better new era of fully connected digital rail transit

In addition to the application in the above industries, in terms of environmental protection, Qinghai green energy data Co., Ltd. has built the first energy big data innovation platform in China based on Huawei cloud fusioninsight, realized innovative businesses such as planning auxiliary decision-making and scientifically guiding power generation with meteorological data, provided 25 categories and 47 data services for upstream and downstream industrial chain enterprises, and promoted the realization of 28 new energy stations “No one on duty, few people on duty”, and effectively supported the “green power 15th” in Qinghai. With big data as the cornerstone, build a green new Qinghai and protect the harmonious ecology of the plateau.

At the same time, Huawei insists on “jumping first with its own parachute”. Huawei Group it builds onedata big data cluster through fusioninsight to realize the development of big data platform in large-scale scenarios. The scale of onedata cluster has reached 10000 + nodes; At the same time, unified data management services are realized. In Bodhi unidb products, 50 + physically dispersed computing clusters (Hadoop + MPP) are integrated into a hucang integrated architecture based on five unification (data security, metadata directory, data integration, data access and task scheduling) through a logical unified way to support the PB level data analysis and processing needs of thousands of enterprise tenants. A fusion data base (Bodhi sea) based on fusioninsight Mrs + DWS with “+ governance, + AI, + operation, + cloud” has been widely used in it of Huawei Group and can be expected in the future.

The above is just the “tip of the iceberg”. On the one hand, Huawei cloud fusioninsight continues to deeply understand customers’ evolving business demands. On the other hand, it continues to make technological innovation and lead the development of the industry. Business demands + technological innovation drive the sustainable and high-quality development of the industry, and “use numbers” to help customers’ business success! The above discussion has fully proved that Huawei cloud is a “knowledgeable” big data leader, which really allows customers to use, manage and use numbers with confidence.

It is understood that by the end of October 2020, Huawei cloud fusioninsight intelligent data lake has served more than 60 countries and regions and 3000 + customers, covering government, finance, operators, power, media, medical treatment, education, transportation, oil and gas, logistics, retail, manufacturing, Internet and other industries.

Click focus to learn about Huawei cloud’s new technologies for the first time~

Recommended Today

Apache sqoop

Source: dark horse big data 1.png From the standpoint of Apache, data flow can be divided into data import and export: Import: data import. RDBMS—–>Hadoop Export: data export. Hadoop—->RDBMS 1.2 sqoop installation The prerequisite for installing sqoop is that you already have a Java and Hadoop environment. Latest stable version: 1.4.6 Download the sqoop installation […]