What is the relationship between big data and cloud computing, and how does Hadoop get involved? Where is Nosql located and what does it have to do with BI?


What is the relationship between big data and cloud computing, how does Hadoop participate, where is Nosql, and what does it have to do with BI? The following text explains their relationship very clearly.

When talking about big data, the first thing to talk about is the 4V characteristics of big data, that is, the type is complex, massive, fast and valuable. IBM originally talked about 3V when talking about big data, and there is no value in this V. In reality, 4V is more appropriate. Value is the ultimate goal of solving big data problems. The other 3Vs serve the value goal. After having the concept of 4V, it is easy to understand the core of big data, that is, the overall architecture of big data includes three layers, data storage, data processing and data analysis. The complex type and mass are solved by the data storage layer, the speed and timeliness requirements are solved by the data processing layer, and the value is solved by the data analysis layer.

The data must be stored in the storage layer first, and then the corresponding data model and data analysis indicator system are established according to the data requirements and goals to analyze the data to generate value. And the intermediate timeliness is completed by the powerful parallel computing and distributed computing capabilities provided by the intermediate data processing layer. The three layers cooperate with each other to make big data finally generate value.

1. Data storage layer

There are many types of data, including structured, semi-structured, and unstructured; metadata, master data, and business data; and various types of data such as GIS, video, files, voice, and business transactions. Traditional structured databases can no longer meet the storage requirements of data diversity, so two types are added on the basis of RDBMS. One is hdfs that can be directly applied to unstructured file storage, and the other is nosql database, which can be applied to Structured and semi-structured data storage.

From the perspective of the construction of the storage layer, the three storage methods of relational database, NoSQL database and hdfs distributed file system are required. Business applications choose different storage modes according to the actual situation, but for the convenience of business storage and reading, we can further encapsulate the storage layer to form a unified shared storage service layer to simplify this operation. From the user's point of view, they do not care about the details of the underlying storage, but only about the convenience of data storage and reading. By sharing the data storage layer, the application in storage and the basic settings of storage can be completely decoupled.

2. Data processing layer

The core problem solved by the data processing layer is the complexity of data processing brought about by the distributed data storage, and the timeliness requirements of data processing after mass storage. These are the problems to be solved by the data processing layer.

In the traditional cloud-related technical architecture, all the technical content related to the hive, pig and hadoop-mapreduce frameworks can be classified into the capabilities of the data processing layer. It turned out that what I thought was that it was not appropriate to classify hive into the data analysis layer, because the focus of hive was on the splitting of complex queries under real processing, and the re-aggregation of query results, and mapreduce itself achieved real distributed processing capabilities.

Mapreduce only implements a distributed computing framework and logic, but the real analysis requirements are split, and the analysis results are aggregated and merged, which still requires the integration of the capabilities of the hive layer. The ultimate purpose is very simple, that is, to support the timeliness requirements under the distributed architecture.

3. Data analysis layer

Finally, returning to the analysis layer, the focus of the analysis layer is to truly mine the value of big data, and the core of value mining lies in data analysis and mining. Then the core of the data analysis layer still lies in the content of traditional BI analysis. Including dimensional analysis of data, slicing of data, drill-up and drill-down of data, cube, etc.

I only focus on two aspects of data analysis, one is the data modeling under the traditional data warehouse, which needs to support the above various analysis methods and analysis strategies; the second is the KPI indicator system established according to business goals and business needs, The analysis model and analysis method of the corresponding index system. Solving these two problems basically solves the problem of data analysis.

Traditional BI analysis forms a complete data warehouse through the extraction and centralization of a large amount of ETL data, while BI analysis based on big data may not have a centralized data warehouse, or the data warehouse itself is distributed. The basic methods and ideas of BI analysis have not changed, but the data storage and data processing methods from landing to execution have undergone great changes.

After talking so much, the core still wants to explain that the two cores of big data are cloud technology and BI. Without cloud technology, big data has no foundation and possibility to land. A simple summary is that the big data target is driven by BI, and the big data implements landing cloud technology.
What is the relationship between big data and cloud computing, and how does Hadoop get involved? Where is Nosql located and what does it have to do with BI?

Edited by Shanghai Shangxuetang big data training, recommended reading: "The relationship between big data and Hadoop》;《11 high-paying skills and 3 jobs in cloud computing and big data

Recommended Today

What is the SpringBoot source code

What is the SpringBoot source code 1 IntroductionSpring Boot is a rapid development framework provided by the Pivotal team. Based on SpringMVC, it simplifies XML configuration through annotations + built-in Http server such as: tomcat-embed-core, and quickly integrates some commonly used third-party dependencies (inheriting dependencies through Maven), The final implementation is executed as a Java […]