First Understanding of Big Data


What is big data? What are the characteristics of big data? What is the relationship between big data and traditional data? What does big data have to do with us? Although many books directly explain the concept and characteristics of big data, according to personal experience, if we first understand the concept and characteristics of data, then we will be easier to understand big data.

Several Questions on Data

What is data? In a narrow sense, data is numerical value, which is the result of observation, experiment or calculation; in a broad sense, the meaning of data is broader, it can also be text, image, sound, etc. At present, we generally refer to data in a broad sense.

What are the characteristics of the data? There are many kinds of data (language, text, numerical value, image, audio and video, etc.). The quality of data is also uneven. Data storage media are different (oral transmission, paper books, digital disks). The characteristics of summarizing data are as follows: many, cluttered and disorderly.

What’s the use of data? The main function of data itself is to record things and their development. People can analyze things according to these recorded data, get relevant laws and results (such as the corresponding calculation formula based on experimental data), and make a phase of existing or future things according to the obtained laws and results. Decision-making actions should be taken.First Understanding of Big Data

Here, I would like to recommend my own large data learning and communication group: 251956502, the group is the development of large data, if you are learning big data, small editors welcome you to join, we are all software development party, do not regularly share dry goods (only related to the development of large data software), including my own sorted one. The latest big data advanced materials and advanced development tutorials are welcome to join the small partners who want to go deep into big data.

The Origin of Big Data

The emergence of the concept of big data is less than 10 years ago. After we understand several basic concepts of data, let’s take a look at the background of the emergence of big data.

As far as the origin of data is concerned, people have recorded corresponding data on stones and trees as early as ancient times, and then recorded and transmitted data with bamboo slips, cotton silk, etc. At this stage, the recording and dissemination of data are very limited; until the emergence of paper and the invention of printing, the recording and transmission of data. Communication has made great progress for the first time, but at this time, the amount of data is still quite small, the speed of transmission is relatively slow, the scope of transmission is relatively narrow, people’s analysis and use of data is very limited; until the emergence of storage media such as computers and disks, people’s ability to record data and calculate and analyze data With the qualitative leap, with the emergence of the Internet and the continuous improvement of communication technology, the speed of data generation and dissemination and the rapid increase in the scope of dissemination, data show explosive growth, people can almost real-time understand all the major events in the world, so that people enter the so-called era of big data.

Basic concepts of big data

What are the similarities and differences between big data and traditional data? Is it just an increase in the amount of data? Should we approach big data the same way we do with traditional data? Is big data directly related to our lives?

What are the similarities and differences between big data and traditional data? At present, the more recognized characteristic of big data is the statement about “4V” of big data, that is, the similarities and differences between big data and traditional data, that is, large amount of data, various types of data, fast data processing speed and low value density of data.




Data Value

Traditional data

Data volume is small and growth rate is slow.

Data type is single, mainly digital and text

Manual calculation and deduction, single machine processing, low timeliness

Value density is high, and the basic information stored is useful.

Big data

Large amount of data and exponential increase of data layer

Data types are abundant and the proportion of audio-visual data is large.

Distributed processing, high timeliness

Value density is low and needs to be mined from large amounts of data

So how should we deal with the data? According to the characteristics of big data and traditional data, we should have some changes in thinking when dealing with these massive data. In the book “Big Data Age: Great Changes in Life, Work and Thinking”, we point out that when dealing with big data, we should have three changes in thinking: total rather than sampling, efficiency rather than thinking. Accuracy, relevance, not causality.

Under the limitation of previous data storage and computing power, sampling method is usually used in data analysis, and some data are analyzed to get corresponding conclusions, which are then extended to the whole data set. In the era of big data, the storage and calculation of data is not the bottleneck. It is possible to use the whole data set for global data analysis to get the corresponding results quickly.

Efficiency is not accurate. In the past sampling analysis, we must ensure the accuracy of sample analysis in order to promote in the global data, so as to avoid the error of data analysis results will expand in the global data, which leads to the analysis and verification of data more cumbersome and inefficient. In the era of big data, the global data are analyzed directly, and the error of analysis results is based on all data directly. The analysis results can be used directly within the acceptable error range without worrying about the spread of analysis errors.

Relevance is not causality. In the past data analysis, the purpose of analysis is often to understand the principle behind the occurrence of things, but in the era of big data, the causality of data is not so important. People often pay attention to how things will develop rather than why data develop in this way, so that the correlation between things becomes obvious. More important.

Recommended Today

Protocol basis: use telnet to learn IMAP protocol

IMAP introduction IMAPThe full name is Internet Mail Access Protocol, or Interactive Mail Access ProtocolPOP3Similar to one of the mail access standard protocols. The difference is, it’s onIMAPAfter that, the e-mail you received from the e-mail client remains on the server, and the operations on the client will be fed back to the server, such […]