Tencent Cloud CKafka and Cloud Functions are launched on DataHub, making data flow easier

Time:2022-8-4

With the advent of the era of big data, major Internet companies have paid unprecedented attention to data, and various businesses have become more and more dependent on data. There is a view that big data has "3V" characteristics: Volume, Velocity, Variety. These three "Vs" indicate the three characteristics of big data: massive, real-time and diverse. The impact of these three main characteristics on the data acquisition system is particularly prominent. A variety of data sources, massive data and real-time efficient collection are the main problems faced by the data collection system.

If we want to create value in data, we must first solve the problem of data acquisition. Because in the development of the Internet, various business systems have been established within enterprises or between different enterprises, and the data generated by these systems are also incompatible with each other. It takes a lot of effort to go up.

01. Tencent Cloud CKafka launched on DataHub

Tencent Cloud message queue CKafka officially launched the data center access service module DataHub. DataHub has powerful data access and analysis and processing functions. It can continuously collect, store and process data from data sources such as App, Web, MongoDB, etc., and obtain various real-time data processing results, which can be used for log analysis, Web activity tracking, IoT analytics applications, etc.

Today's data processing systems can be roughly divided into offline processing systems and online processing systems. CKafka has launched the Datahub data center access service module, which is responsible for obtaining data directly from business data sources, performing some data preprocessing, distributing it to offline/online processing platforms, building a bridge between data sources and data processing systems, and integrating data processing systems with Decoupling of data sources on the business side.

02. DataHub product advantages

Based on the data processing capabilities of CKafka, DataHub has the advantages of high stability, real-time, high scalability, and high security:

  • High stability

Based on the distributed deployment of message queue CKafka, the stability is well guaranteed.

  • real-time

Real-time and efficient data collection can also be processed in real-time.

  • High scalability

It supports horizontal expansion of clusters, seamless instance upgrades, automatic elastic scaling of the underlying system according to the business scale, and no perception of the upper-level business.

  • High security

Network isolation between different tenants, and instance network access is naturally isolated between accounts. Support CAM authentication of management flow and SASL permission control of data flow, strictly control access rights.

  • Upstream and downstream ecological integration

Support 13+ cloud product resources such as EMR, COS, container, stream computing, cloud function, log service, etc., to achieve fast one-click deployment.

  • Unified operation and maintenance monitoring

Provide a complete set of operation and maintenance services of Tencent Cloud Platform, including multi-dimensional monitoring and alarming services such as tenant isolation, permission control, message accumulation query, and consumer details viewing.

03. DataHub application value

CKafka is a distributed, high-throughput, and highly scalable messaging system. Based on the publish/subscribe model, through message decoupling, producers and consumers can interact asynchronously. It has data compression and supports both offline and real-time data processing. advantage. As a functional module of CKafka, DataHub can help users take CKafka as the entrance, connect to commonly used data sources and sinks through simple interface configuration, package solutions for various scenarios, and realize real-time data channels, real-time data cleaning and Analysis function.

In practical applications, DataHub accesses different types of data generated by various data sources in real time. Users can deliver data from multiple data sources to the same topic for unified management, simple data processing, and delivery to downstream. The data processing system forms a clear data flow to better release the value of data.

DataHub can simultaneously realize the decoupling between big data system and business system and each component of big data system.

1. Real-time data channel

We all know that data between different enterprises and different businesses is not connected with each other, and the data that is not integrated with each other will encounter many problems in data acquisition and transmission, such as poor availability and transmission delay. At the business level, problems such as the migration of the old business data system to the new system and the unavailability of data during the data integration process of different systems will also occur, which will affect the follow-up of the business.

In order to integrate data more efficiently in real time, DataHub utilizes data access capabilities,The business data can be integrated into the big data system in real time, and the data analysis cycle can be shortened.For customers, it is a real-time data channel, so how does DataHub achieve this?data accessability?

As can be seen from the above figure, the data sources of DataHub can be divided into:Active reporting, service class and log class.

  • Active reporting categories: App, Web, games, etc.;
  • Service class: MongoDB, COS, MySQL, etc.;
  • Log classes: containers, network flow logs, CVM, etc.

The console interface of data access is shown in the figure, which displays the data access task list created by the user.

Click the list item to view the details of each data access task and view the monitoring.

The specific operation of data access is mainly divided into the following two parts:

1. Take the initiative to report:SDK is provided, and the usage process is as follows:

  • Take HTTP reporting as an example:

After the task is successfully created, an access point will be generated, and you can view and copy the access point in the task details later.

2. Asynchronous pull

Service classes, log classes, and interface classes provide a complete productized configuration interface, and users do not need to care about the underlying implementation.

  • Take MongoDB as an example:


2. Real-time data cleaning and analysis

After DataHub accesses various types of data, it cleans, filters, associates and transforms data from various data sources in real time through data outflow and data processing to form unified structured data and realize the integration of different types of data from each data source. fusion.

How does DataHub work?Data cleaning and analysis processingwhat about?

1. Data outflow

Use cloud function SCF or sink connector to distribute data to various downstream cloud products.

  • Create a new data outflow task

After clicking Submit, a record will be added to the data outflow task list, and you can view the task details and monitor.

2. Data processing

DataHub continues the data processing capabilities of Kafka to Kafka.

Click "New Task", a pop-up window will appear:

The above is an interface display of some simple cleaning rules. More advanced cleaning rules will support writing functions for configuration later. The parsing mode supports JSON, delimiters, and regular expressions. Click Test to verify the data processing rules set above.

As shown in the figure below, the sidebar of the message queue CKafka console is divided intoMessaging Platform and DataHubTwo modules are more convenient to find and use. Currently, DataHub has been launched. Users who need to use data access and data processing and analysis functions can use it!

In the future, the development of Tencent Cloud message queue products will make further explorations in the direction of data acquisition and data processing, and will also combine upstream and downstream products to provide users with more solutions that fit the scenario. DataHub can be developed into a unified data interface on the cloud in the future. It provides a more stable platform for the access, analysis and processing of data from various data sources.

04. DataHub usage consultation

At present, DataHub has been fully released and online, go to Tencent Cloud Message Queue Ckafka console to experience it. In order to better provide you with products and services,Click hereFill out the form below and we will contact you within 1-3 business days to communicate specific business needs.

One More Thing

Immediately experience Tencent Cloud Serverless Demo and receive a new serverless user packageTencent Cloud Serverless Novice Experience