Complete the data migration and synchronization from MySQL / PostgreSQL / Oracle to starrocks in 5 minutes – cloudcanal actual battle

Time:2022-5-13

sketch

CloudCanal 2.1.0. Version x supports starrocks as a peer-to-peer data migration synchronization capability

This article briefly introduces the source side capabilities through the data migration synchronization case of MySQL – > starrocks. Link characteristics:

  • Structure migration, full migration, incremental synchronization (data) and data verification are complete
  • Full automation of process

Instructions for use

  • Starrocks version supported: 1.18 x、1.19. x、2.0. x
  • The supported source data source types are:Oracle/PostgreSQL/Greenplum/MySQL, this paper mainly takes MySQL source side as an example to illustrate the use method.
  • The verification task only supports the starrocks table of the primary key model. The source side table needs to have a primary key

Technical point

Import method based on streamload

Starrocks provides a variety of import methods. Cloudcanal adopts the method of streamload to import, and the message at the source will be converted into a byte stream and sent to starrocks through HTTP protocol.

Compared with writing directly through SQL, streamload has better performance. The written data is directly forwarded to be through Fe for processing. If SQL writing is directly used, there will be additional SQL parsing overhead on the Fe side.

Tips: cloudcanal uses \ u0001 and \ u0003 as the column separator and row separator imported by streamload by default.

Rich configuration

Cloudcanal provides a wealth of configurable parameters for streamload, including:

  • connectionTimeoutSecparameter

    • Indicates the timeout of the import request in seconds
  • loadExecMemLimitMbparameter

    • Indicates the import memory limit
  • timezoneparameter

    • The default is the East eighth zone, which specifies the time zone used for import
    • This parameter will affect the results of all functions related to time zone involved in import.

Configurable write pause control

Writing too fast will cause starrocks to have no time to make a compaction, resulting in an exception. Cloudcanal provides two task parameters to support automatic pause for a period of time after a batch of data is written to avoid this problem. Parameters are:

  • fullBatchWaitTimeMsparameter

    • Write pause interval between full batches, unit: ms, default: 10ms
  • increBatchWaitTimeMsparameter

    • Write pause interval between incremental batches, unit: ms, default: 100ms

Cloudcanal task details page, clickParameter modification, you can adjust

Complete the data migration and synchronization from MySQL / PostgreSQL / Oracle to starrocks in 5 minutes - cloudcanal actual battle

Complete the data migration and synchronization from MySQL / PostgreSQL / Oracle to starrocks in 5 minutes - cloudcanal actual battle

Primary key model

Starrocks, as a real-time data warehouse, adoptsPrimary key modelperhapsAggregation modelMore. Cloudcanal is adopted by defaultPrimary key model, which can synchronize the data at the source in real timeINSERT/UPDATE/DELETE

Tips: the table structure has a great impact on the actual reading and writing performance of the real-time data warehouse. The structure migration capability provided by cloudcanal by default does not provide such aspartitionBarrel DivisionAnd other settings. If users need to use it, they can create a table structure in advance according to their needs, and then conduct data migration and synchronization through cloudcanal.

Processing of update / delete operations

Based on the writing method of streamload, the actual operation of writing to the opposite end is insert. During cloudcanal synchronization, update / delete will be automatically converted into insert statement and modified__opsValue, starrocks will automatically perform compaction.

Escape support

Starrocks does not support\nAnd other special symbols are written, and the cloudcanal task is set through parameters(enableEscape(2) automatic escape of parameters.

Operation example

add data source

  • Log in to cloudcanal platform
  • Data source management – > add data source
  • Select starrocks in self built database
    Complete the data migration and synchronization from MySQL / PostgreSQL / Oracle to starrocks in 5 minutes - cloudcanal actual battle
  • Client address: the service port provided by starrocks to MySQL client. Cloudcanal mainly uses it to query the metadata information of the database table
  • HTTP address: http address is mainly used to receive HTTP requests of stream load

Task creation

  • task management->Task creation
  • choicesourceandtargetdatabase
  • Click next
    Complete the data migration and synchronization from MySQL / PostgreSQL / Oracle to starrocks in 5 minutes - cloudcanal actual battle
  • choiceIncremental synchronization, and enabledFull data initialization
  • Do not check DDL synchronization (not supported temporarily)
  • Click next
    Complete the data migration and synchronization from MySQL / PostgreSQL / Oracle to starrocks in 5 minutes - cloudcanal actual battle
  • Select a table to subscribe to,The table automatically created by structure migration is the table of the primary key model, so the table without primary key is not supported temporarily
  • Click next
    Complete the data migration and synchronization from MySQL / PostgreSQL / Oracle to starrocks in 5 minutes - cloudcanal actual battle
  • Configure column mapping
  • Click next
    Complete the data migration and synchronization from MySQL / PostgreSQL / Oracle to starrocks in 5 minutes - cloudcanal actual battle
  • Create task
    Complete the data migration and synchronization from MySQL / PostgreSQL / Oracle to starrocks in 5 minutes - cloudcanal actual battle
  • View task status. After the task is created, the structure migration, full volume and incremental phases will be completed automatically.
    Complete the data migration and synchronization from MySQL / PostgreSQL / Oracle to starrocks in 5 minutes - cloudcanal actual battle

Task performance tuning (option)

For high traffic scenarios, it is recommended to use4GAnd the above task specifications, and the relevant parameters are optimized. The optimization is based on the condition that the task has no GC problem and the peer starrocks has no competition bottleneck.

  • Task details->Function list->Parameter setting
    Complete the data migration and synchronization from MySQL / PostgreSQL / Oracle to starrocks in 5 minutes - cloudcanal actual battle
  • Full migration parameter adjustment
    Complete the data migration and synchronization from MySQL / PostgreSQL / Oracle to starrocks in 5 minutes - cloudcanal actual battle
  • Incremental synchronization parameter adjustment
    Complete the data migration and synchronization from MySQL / PostgreSQL / Oracle to starrocks in 5 minutes - cloudcanal actual battle

summary

This article briefly introduces how to use cloudcanal for data migration and synchronization from Mysql to starrocks. Dear readers, if you think it’s good, please praise, comment and forward it.

Common FAQs

Under the default task parameter configuration, if you import data too frequently, the task may be abnormal. At this time, you can adjust the parameters of fullbatchwaittimes and increbatchwaittimes mentioned above, or adjust the consolidation strategy on the server side of starrocks. The following figure shows the FAQ provided by starrocks

Complete the data migration and synchronization from MySQL / PostgreSQL / Oracle to starrocks in 5 minutes - cloudcanal actual battle

More wonderful

Join the cloudcanal fan group to master first-hand news and get more benefits. Please add our little assistant wechat: suhuayue001
Cloudcanal – a free and easy-to-use enterprise data synchronization tool. Welcome to taste it.
To learn more about products, you can viewOfficial websitehttp://www.clougence.com
Cloudcanal communityhttps://www.askcug.com/

Recommended Today

Promoting the construction of computing network makes China have more power in the face of the soaring data increment

Computing network is an original technical concept first put forward in China. It refers to a new information infrastructure that relies on high-speed, mobile, secure and ubiquitous network connection, integrates multi-level computing resources such as network, cloud, digital, intelligence, edge, end and chain, and combines various emerging digital technologies such as AI, blockchain, cloud, big […]