Back in my naive days when I started my master’s degree in data science, I was excited and eager to try any topic related to big data. I’m trying to mine every set of data that belongs to 3V_ (3V: in a report in 2001, Lenny, an analyst of meta group, put forward the viewpoint of “3D data management” on big data, that is, big data will develop in three directions of high speed, diversity and mass, and put forward three characteristics: high speed “velocity”, diversification “variety” and large-scale “volume”, collectively referred to as 3V.)_ I want to extract the data that can be analyzed from the endless data flow, and then model, visualize, transform and so on. But when time comes to the present, when I see the three words “big data”, I always raise my eyebrows unconsciously, and I wonder which is the same “popular” and “fuzzy” technology hot word that will appear next?
01 “hot spot” and “reality”
Recently, I have become very sensitive to the words that the Internet seems to be “tall”. For example, “Empowering the digital age with big data” looks really cool! But what does it mean? In the face of enterprises or individuals in real life who are trapped in Excel forms and frustrated but have to work slowly by hand, what does this “cool” phrase really mean?
Big data is exciting because it represents a huge wealth in which you can search, find and use anything of value to you. My initial view of big data was that “among all these data, there must be some deep meaning that we absolutely want to know”. I may be right about that, but what is the cost of finding something valuable from a lot of data?
02 Without the right infrastructure, big data is rubbish
Before processing big data, we need to build the basic data processing architecture to ensure that the whole system has strong computing power, storage capacity, and data transmission capabilities. This usually costs a lot of money, but at the same time, there are all kinds of unexpected bottlenecks. With the development of cloud platform, although the computing power becomes cheaper and easier to use, with the exponential growth of cloud storage, the daily use of cloud computing and even the maintenance of local servers are not small expenses. Therefore, there is an interesting myth of this century
Sometimes, for some businesses / individuals, data like gold is garbage that wastes the entire storage space and computing power for others.
At present, companies are spending money on data mining. Would it be more efficient and energy-saving if we first determined whether the data was useful before collecting it?
03 Not all data is worth noticing
There is a saying in data science that “useless input and invalid output”. Indeed, in practical applications, a lot of data is actually unreliable and requires a lot of effort to clean up before it can be used. What’s more, it often happens that we spend a lot of energy, time and money only to find a small amount of information in a large data set.
As Forrester reports, “there is at least 60% of idle data in the enterprise.”.
Why not use the money to store idle data to build the right data processing architecture?
It has been realized that not every feature of data is useful (some may even be harmful), and the quality of data is often more important than quantity. We want data to present our concerns in a reliable and consistent way. This recognition will lead us to an interpretable, responsible and safe stage of AI research.
To sum up
At present, we have realized the importance of data, and then, what we need to do is to build a better infrastructure to use, share and analyze data more safely, and to distinguish useless data from valuable information more accurately. We also need to ensure the quality and reliability of the data, to ensure that they are available worldwide and understand what they mean (which is particularly important for future AI research). Finally, I would like to say that the fundamental value of data is not huge, but reliable and effective.
By ~ “big” data
Effective and reliable data will usher in a longer life cycle!
It doesn’t sound so cool, but it’s more lovely and reassuring, isn’t it?
Link to the original text: https://towardsdatascience.com/bye-bye-big-data-fbea187c7739
The above information is from the Internet, compiled by the official account of Jingdong cloud developer, and does not represent Jingdong’s cloud location.