Massive structured data solution — Interpretation of table storage scenario

Time:2021-7-15

Introduction:Data is the core asset driving business innovation. Different types of data, such as unstructured data (videos, pictures, etc.) and structured data (orders, tracks), need to choose the appropriate storage engine for different business requirements, which can really play the value of data. For massive structured / semi-structured data of massive non strong transactions, table storage is a one-stop solution. Here is a detailed interpretation of the use of the scene.

Data is the core asset driving business innovation. Different types of data, such as unstructured data (videos, pictures, etc.) and structured data (orders, tracks), need to choose the appropriate storage engine for different business requirements, which can really play the value of data.
For example, unstructured data – video images and so on are suitable for object storage OSS, and strong transaction structured data – transaction orders are suitable for MySQL.

For massive structured / semi-structured data with massive non strong transactions:
Massive structured data solution -- Interpretation of table storage scenario
These scenes are characterized by:

  1. The data scale is large, and the common relational database is difficult to store.

2. It needs to support high read / write throughput and low response latency.

  1. The data structure is relatively simple, and there is no cross table associated query. Data storage and writing need no complex transaction mechanism.

Table store talestoreIt is to solve the above data storage, access and calculation.

Historical order scenarios

In all scenarios involving transactions and agreements, such as e-commerce, finance, takeout, new retail, a large number of orders are involved. Record all aspects of society. Traditional relational data can solve the online business that needs to support strong consistent transactions, but massive order relational data can not save the full amount of data, so it needs data layering.
Architecture core requirements

  • Online data synchronization: layered real-time data and historical data support real-time synchronous online business
  • Historical data storage: historical order data storage – support low delay data point search and search.
  • Cost effective mass storage data analysis: report statistical analysis for historical database – need to support calculation component analysis and statistics!Massive structured data solution -- Interpretation of table storage scenario

Core advantage

  • Make up for the problem of online library capacity and reduce the pressure of online library
  • Pb level historical storage, can save all data in full, and can provide low latency and high concurrency query
  • Index multiple fields in order, provide any combination of conditions query

Im / feed streaming scenarios

Im (instant messaging) has become the basic component of the current Internet business, and is widely needed in social networking, games, live broadcast and other scenarios. It needs to support the storage, synchronization and retrieval of massive messages efficiently.  
Architecture core components

  • Message history library: store historical messages according to dialog – massive data is needed, and the storage is easy to expand
  • Message synchronization library: stores synchronization messages by receiver – supports high concurrent write and real-time pull (write diffusion)
  • Message index: support data retrieval for historical database data – need data update synchronization
    Massive structured data solution -- Interpretation of table storage scenario

Core advantage

  • The tablestore timeline message model is designed for IM / feeds scenarios to simplify development
  • Synchronous table 100 TB storage, storage table Pb level storage.
  • Distributed architecture, LSM storage engine, support million writes per second, spread message writing, millisecond synchronous library fetching
  • Read write diffusion hybrid synchronization model

Timing scenario – monitoring / IOT

The recording and analysis of real-time data greatly enrich our usage scenarios of data. The operation and maintenance monitoring of the system and the monitoring of the environment and people in IOT scenarios are more effective to help us understand the facts and make decisions. Here we need to face many devices and systems of high concurrent write and data storage, as well as decision analysis.

Scenario core requirements

  • High concurrent data writing: for many devices and systems, it supports real-time writing of millions of nodes
  • Real time data aggregation: for raw data monitoring pre aggregation, reduce accuracy – support data real-time synchronous docking flow calculation
  • Data storage: long term preservation of data – large scale single table storage with high cost performance

Massive structured data solution -- Interpretation of table storage scenario

Core advantage

  • The core single table data scale is up to 10 Pb, and the data life cycle can be customized
  • The core single table writes continuously every second500010000 data points
  • Real time data writing greatly improves the visible timeliness of data
  • Millisecond level real-time query shows the trend chart and report, and the query performance is not constrained by the scale of a single table

Public opinion & risk control analysis

The analysis and control of public opinion information can effectively analyze and insight into the market. For example, the collection and analysis of comments, news, comments and other information. Rich multi class data, high concurrent writing and convenient data flow are needed for calculation and analysis

Scenario core requirements

  • Raw data write storage: massive data crawler needs high concurrent write ability and Pb and storage.
  • Multi data type storage: the crawled content is similar to the generated tag, and it needs to be written into schema free
  • Data analysis: for data processing in stages from original information to structured label to result storage, real-time computing and offline computing should be supported
    Massive structured data solution -- Interpretation of table storage scenario

Core advantage

  • Distributed LSM engine data storage, providing high concurrency, high throughput write, Pb level data storage
  • Through data update capture, real-time trigger subsequent data custom processing logic
  • Real time data synchronization with big data platform, analysis results written into the result table, real-time query of supply layer

Recommender system

Recommender system, as the main gripper of all business refinement operation, subverts the traditional content output mode and becomes the core engine of circulation in the current massive information age. It is widely used in e-commerce, short video, news and other scenes. It needs to support massive message storage and real-time, offline analysis.  
Architecture core components

  • Behavior log: stores real-time data written by clients – requires high concurrency writing, and supports real-time analysis of streaming computing
  • Historical data: cold data sink to OSS data Lake synchronously
  • User Tags: for analysis tags and recommendation information storage – need to support attribute column horizontal expansion, efficient retrieval

Massive structured data solution -- Interpretation of table storage scenario

Core advantage

  • Data scale: unlimited storage capacity, flexible definition of hot and cold data layers
  • Massive Concurrency: single table write level expansion, supporting 100 million rows per second level
  • Data is written in real time and visible in real time
  • The data is delivered to OSS data Lake in real time. The tablestore only stores hot data, provides rich index and high throughput scanning

Original link
This article is the original content of Alibaba cloud and cannot be reproduced without permission.