In the first two popular science articles, we discussed how blockchain and smart contract can transfer and store value safely and reliably as a new generation infrastructure. The smart contract on the blockchain, like a computer without Internet connection, has its intrinsic value. The intrinsic value of smart contract is to create and trade the token. However, when computers are connected to the Internet, great innovation and value are released. Similarly, smart contracts will become extremely powerful once they are connected to the fast-growing off chain data and API economy. If smart contract can be connected to various huge databases such as offline data providers, web APIs, enterprise systems, cloud service providers, Internet of things devices, payment systems and other blockchains, it will become the mainstream digital protocol across various industries. In this article, we will analyze the data and API in depth in the following aspects:
- What is the data? How does it drive the data economy?
- How is data produced?
- How to exchange data through API?
- What is big data analysis?
This article will comprehensively analyze the economic structure of off chain data. In the next article, we will discuss how to use an infrastructure called “oracle” to connect smart contracts to these offline data safely and reliably. O:p
Data and data economy
Data is the result or information obtained through observation, such as measuring the outdoor temperature, calculating the geographical location of the car or recording the interaction between users and applications. The original data itself has neither special value nor reliability, but needs to be interpreted or confirmed with other data to ensure the authenticity and validity of the data.
Metadata is “data about data”. Metadata mainly contains the basic information of data, the purpose is to greatly reduce the difficulty of tracking and processing information. For example, the time a message was sent, the geographic location of a temperature value, or the length of a phone call are all metadata. Its purpose is to index and give meaning to data.
In addition, important applications need to ensure data reliability, so it needs to be processed and cleaned. The cleaning process includes removing outliers, finding errors and eliminating irrelevant information. For example, the current temperature is compared with the historical temperature to identify and eliminate abnormal values.
In the data economy, all kinds of data are collected, refined and exchanged, and valuable insights are generated. These insights can have the greatest social benefits, such as storing clinical research data in a shared medical database to better understand the latest medical trends, or the private sector tracking internal operational processes to identify and improve inefficiencies.
With the continuous development of data economy, the degree of automation is also rising. Data can directly trigger economic behavior without human intervention. For example, the applied algorithm stipulates that as long as three conditions are met, the payment for goods will be automatically paid. These three conditions are: 1) the goods are delivered (GPS data); 2) the goods are in good condition (Internet of things data); 3) the goods have been cleared (web API).
Data is the by-product of a process or event. The generation of data requires input (i.e., behavior), the recording of data needs to be extracted (i.e., measurement), and the meaning of data needs aggregation (i.e., analysis). Due to the limited threshold of data input, extraction and aggregation technology, the data can not be “everyone is equal”, and the data quality is also uneven.
The following are common ways to get new and raw data:
- Table (data entered manually): data manually entered by users by filling in public and private forms (such as answering questionnaires, signing documents or speaking on social platforms).
- Application / website (user agreed data): data obtained after users agree to the terms and agreements of the app or website. After agreeing to these terms and agreements, users usually authorize websites or apps to track certain data, such as the operation of app, browsing habits or even personal information such as gender and age.
- Internet of things (real time monitoring data): data captured by devices equipped with sensors and actuators. And through smart phones, smart home, wearable devices, RFID devices and other Internet devices to transmit data.
- Own process / personal experience (internal or personal data): data obtained by a business process controlled by an enterprise due to its patent or market leadership; or data generated from personal unique experience.
- Research and analysis (gather and interpret data): collect and analyze the data from existing data sets, including cross comparison with historical data, cross reference to other data sets, and adoption of new filtering and calculation methods. In addition, there are data distributors who purchase large amounts of data from data aggregators or enterprises and resell them to end users. Although data distributors resell the data at a higher price, they will process the data into a suitable structure or format according to the needs of users before reselling.
If data is to become the core pillar of the next generation of applications, we can’t rely entirely on internal generated data, but we must establish a data transaction mechanism, because the cost of purchasing data is much lower than that of producing data. For example, the algorithm of developing autopilot requires a lot of data for target detection, target classification, target location and motion prediction. Developers can generate this data internally, but at the cost of accumulating millions of miles; and they can also buy it through the API.
An application programming interface (API) is actually a set of commands that control how external applications access data sets and services within the system. API is the current standard solution for data and service transactions. Uber, the mainstream taxi software, connects mapbox’s GPS API for vehicle positioning, twilio’s SMS API for instant messaging and Braintree’s payment API for payment. These functions are purchased from existing technical solutions, not developed from scratch by Uber itself.
(the API economy has been on a steady upward trend since its emergence, during which many new APIs and new solutions for managing APIs have emerged. Source: software development company Informatica)
The charging mode of API is usually subscription mode. End users can pay according to the number of times they use, monthly or ladder system. As a result, data providers receive economic incentive production data, and end users do not have to produce them themselves. API providers and paying users will also sign a contract with legal effect to avoid data embezzlement or unauthorized resale and other malicious acts, and constrain data providers to be responsible for their own data quality.
There are a number of APIs that are free for all to use, including open weather map for weather data, skyscanner flight search for flight information, and gdelt for global human behavior and belief data. In addition, governments all over the world have actively launched the initiative of transparent data, and continue to increase efforts to open source API. However, the reliability of the open source API is not as good as that of the paid API, because of the lack of economic incentives and legal agreement constraints, the data quality and delay risk can not be controlled. Most quality data still comes from paid APIs, which usually have top-notch data sources, full stack infrastructure, and full-time monitoring teams, and constantly strive to innovate to surpass competitors.
Big data infrastructure and analysis
Programming system can learn and improve itself, this concept has been warmly sought after. The process of learning includes taking actions, receiving results, comparing and analyzing historical data to generate new insights, improving methods, and ultimately achieving goals. Therefore, the current trend is to develop an infrastructure that can learn autonomously, absorb a large amount of data, filter and classify the data, and generate insights based on the analysis results.
The reason why Facebook, Google and Amazon in the United States and Alibaba, Tencent and Baidu in China can become today’s technology giants is that they have deeply cultivated Internet applications and generated massive user data. These data have laid a solid foundation for the world’s top data analysis tools, especially artificial intelligence and machine learning software. These big data analysis technologies can generate rich insights into consumer behavior, social trends and market trends. At the same time, business management software also helps enterprises better understand their operations. Enterprises such as SAP, salesforce and Oracle have developed enterprise resource planning system (ERP), customer relationship management system (CRM) and cloud management software, which enable enterprises to summarize all data and systems in internal business processes and generate key insights.
Cloud computing and storage technology is receiving more and more attention. With cloud computing, users can share the cloud infrastructure to store and process data, so they don’t need to occupy their own system resources. Cloud technology improves the back-end process of applications, enhances the sharing between different systems, and reduces the cost of artificial intelligence and machine learning software. For example, Google cloud users can use bigquery, a SaaS software that can analyze billions of bytes of data in batches, and has built-in machine learning capabilities.
The fourth industrial revolution is coming
Combining artificial intelligence / machine learning, business management software and cloud infrastructure, we can gain more insight from data. In addition, the rise of edge computing, 5g communication network and biotechnology has also promoted the development of real-time data and bio connected data environment. Driven by these emerging systems, the economic system has been developing towards the direction of de human intervention and real-time data-driven decision-making, while the barriers to data generation and sharing have almost disappeared, and the frequency has been rising, which has further promoted the development of the general trend. Many people call this trend the “fourth industrial revolution.”.
Welcome to the chainlink developer community
For more information, please look forward to our future article. The next article in this science series will explore the topic of smart contracts. Welcome to our Twitter account and receive updates. You can also join our telegraph group to learn the latest news about chainlink.