Under the head effect of the Internet market, enterprises are facing more and more competitive pressure. How to effectively solve the problems of high customer acquisition cost, low user stickiness and weak liquidity is the original intention of more and more enterprises to build big data platform.
However, due to the complexity of the components involved in the big data solution, the high technical threshold, and the large initial investment of resources and later maintenance costs, it is a great test of the enterprise’s big data platform construction and operation and maintenance ability. Therefore, the ucloud big data team recently launched the big data intelligent platform (USDP), which aims to help enterprises quickly build a big data analysis and processing platform and centralize the management of big data cluster, so as to reduce the cost of big data development and maintenance.
One stop big data intelligent management platform
USDP is an intelligent platform to help enterprises build cloud hosted one-stop big data collection, storage, analysis, application and operation and maintenance. The product architecture is as follows:
As can be seen from the above figure, USDP is built on the basis of ucloud public cloud IAAs resources and provides Hadoop ecological service system, such as HDFS, hive, HBase, spark, Flink, presto, Atlas and ranger And other open-source big data service components, and carry out intelligent operation and maintenance management of these components, such as configuration management, monitoring alarm, fault diagnosis, so as to help enterprises quickly build the analysis and processing capacity of big data.
Users can quickly and conveniently deploy various services and components in big data cluster through USDP, and centrally operate and maintain these components. In the process of deploying services and components, USDP can complete the whole process automatically, which greatly reduces the deployment cost.
At the same time, the integrated real-time monitoring view and alarm strategy in USDP can help the operation and maintenance personnel to obtain abnormal alarm information in time, and quickly locate and troubleshoot problems.
In addition, USDP highly integrates services and components in Hadoop ecosystem, and adapts them all based on Apache version without in-depth modification. Therefore, users do not need to worry about API incompatibility problems caused by using service components, and users do not need to worry about being bound by service frameworks other than Apache open source protocol.
Lightweight big data “housekeeper” with automatic operation and maintenance
As a pure domestic and ucloud self-developed big data management service, USDP can realize the convenient and unified interaction on and off the cloud
- Comprehensive component support
Based on the open management architecture, USDP integrates more than 30 open source big data components, covering all aspects of big data processing, such as data integration, data storage, computing engine, task scheduling, permission management, etc., which is the most comprehensive in the industry. Enterprises can choose the corresponding components to build their own big data processing platform according to their own business characteristics and needs.
- Perfect monitoring alarm mechanism
Based on years of big data operation and maintenance experience, USDP has preset perfect monitoring and alarm templates for each component, rich monitoring indicators and flexible alarm modes to help users master the operation status of each component in time and carry out necessary maintenance and optimization. At the same time, intelligent fault diagnosis tools and professional technical support team escort the stable operation of big data cluster.
- Visual workflow UDS
UDS (ucloud Data Studio) is a lightweight, distributed and extensible visual DAG workflow task scheduling system developed by ucloud. Through drag and drop workflow development IDE, simple web drag and drop operation to complete the task development of the whole big data workflow. Built in rich processors, diverse task support: shell, python, hive, spark, Mr, SQL, subprocess, etc.
UDS provides visual process definition capability, provides high-speed and stable data integration capability for a large number of heterogeneous data sources, and realizes ETL operation of data in the process of synchronization.
- Safety and stability
The underlying resources of public cloud USDP are exclusive to users, and the cluster is located in an independent virtual private network, which realizes effective security isolation. At the same time, each component integrated by USDP is compiled from the stable version of Apache community, which has undergone strict compatibility test and stress test. The key components support high availability features to ensure the stable and reliable operation of the cluster.
- Flexible and easy to use
For big data application scenarios, public cloud USDP provides a variety of models (big data physical machine, Putong virtual machine, Kuaijie virtual machine, etc.) for users to choose, and combines with the elastic scalability of public cloud to effectively control the actual use cost. The wizard like operation process and perfect scene cases help users get started easily.
- Support privatization deployment
In addition to integrating with ucloud public cloud IAAs, USDP can also be deployed as an independent big data component management platform in the privatized data center, and is compatible with virtual machine and physical server environment. To provide big data platform services consistent with the public cloud experience for customers with private deployment.
Ucloud also provides an integrated software and hardware delivery scheme based on USDP, which has pre installed USDP service in advance to realize the big data platform management service of plug and play for users.
Typical application scenarios
1. Data warehouseAt present, the most commonly used data warehouse model in China is dimension data warehouse, which is to build data warehouse and data mart according to fact table and dimension table. In this system, dimension is the angle to describe facts, such as date, customer, supplier, etc. facts are the indicators to be measured, such as the number of customers, sales, etc. Through USDP, users can deploy all the services needed to build the dimensional data warehouse to help enterprises quickly build the data center.
2. Machine learningIn the field of machine learning, there is a large demand for computing. Through the distributed computing framework such as spark and Flink in USDP, combined with official algorithms or self-developed algorithms, machine learning development can be done with half the effort. At the same time, in the field of deep learning, a large amount of data needed for modeling can also be stored in HDFS, so as to truly realize one-stop development.
3. Real time computingKafka, Flink and spark streaming in USDP can be used for real-time data processing to meet the needs of real-time risk control, real-time recommendation, real-time log analysis, real-time click and other scenarios.
In the era of big data, data is the core production factor of enterprises, and its hidden business value cannot be separated from the deep mining of big data technology. The launch of USDP is to solve the problems of high cost and high-tech threshold faced by enterprises in building big data solutions, help more enterprises to quickly build big data services, and fully release the business value of data productivity