sketch
CloudCanal 2.1.0. Version x supports starrocks as a peer-to-peer data migration synchronization capability
This article briefly introduces the source side capabilities through the data migration synchronization case of MySQL – > starrocks. Link characteristics:
- Structure migration, full migration, incremental synchronization (data) and data verification are complete
- Full automation of process
Instructions for use
- Starrocks version supported: 1.18 x、1.19. x、2.0. x
- The supported source data source types are:Oracle/PostgreSQL/Greenplum/MySQL, this paper mainly takes MySQL source side as an example to illustrate the use method.
- The verification task only supports the starrocks table of the primary key model. The source side table needs to have a primary key
Technical point
Import method based on streamload
Starrocks provides a variety of import methods. Cloudcanal adopts the method of streamload to import, and the message at the source will be converted into a byte stream and sent to starrocks through HTTP protocol.
Compared with writing directly through SQL, streamload has better performance. The written data is directly forwarded to be through Fe for processing. If SQL writing is directly used, there will be additional SQL parsing overhead on the Fe side.
Tips: cloudcanal uses \ u0001 and \ u0003 as the column separator and row separator imported by streamload by default.
Rich configuration
Cloudcanal provides a wealth of configurable parameters for streamload, including:
-
connectionTimeoutSecparameter
- Indicates the timeout of the import request in seconds
-
loadExecMemLimitMbparameter
- Indicates the import memory limit
-
timezoneparameter
- The default is the East eighth zone, which specifies the time zone used for import
- This parameter will affect the results of all functions related to time zone involved in import.
Configurable write pause control
Writing too fast will cause starrocks to have no time to make a compaction, resulting in an exception. Cloudcanal provides two task parameters to support automatic pause for a period of time after a batch of data is written to avoid this problem. Parameters are:
-
fullBatchWaitTimeMsparameter
- Write pause interval between full batches, unit: ms, default: 10ms
-
increBatchWaitTimeMsparameter
- Write pause interval between incremental batches, unit: ms, default: 100ms
Cloudcanal task details page, clickParameter modification, you can adjust
Primary key model
Starrocks, as a real-time data warehouse, adoptsPrimary key modelperhapsAggregation modelMore. Cloudcanal is adopted by defaultPrimary key model, which can synchronize the data at the source in real timeINSERT/UPDATE/DELETE。
Tips: the table structure has a great impact on the actual reading and writing performance of the real-time data warehouse. The structure migration capability provided by cloudcanal by default does not provide such aspartition、Barrel DivisionAnd other settings. If users need to use it, they can create a table structure in advance according to their needs, and then conduct data migration and synchronization through cloudcanal.
Processing of update / delete operations
Based on the writing method of streamload, the actual operation of writing to the opposite end is insert. During cloudcanal synchronization, update / delete will be automatically converted into insert statement and modified__opsValue, starrocks will automatically perform compaction.
Escape support
Starrocks does not support\nAnd other special symbols are written, and the cloudcanal task is set through parameters(enableEscape(2) automatic escape of parameters.
Operation example
- Cloudcanal Community Edition deployment, seeCommunity Edition installation documentation
- Preparation of data source and target
add data source
- Log in to cloudcanal platform
- Data source management – > add data source
- Select starrocks in self built database
- Client address: the service port provided by starrocks to MySQL client. Cloudcanal mainly uses it to query the metadata information of the database table
- HTTP address: http address is mainly used to receive HTTP requests of stream load
Task creation
- task management->Task creation
- choicesourceandtargetdatabase
- Click next
- choiceIncremental synchronization, and enabledFull data initialization
- Do not check DDL synchronization (not supported temporarily)
- Click next
- Select a table to subscribe to,The table automatically created by structure migration is the table of the primary key model, so the table without primary key is not supported temporarily
- Click next
- Configure column mapping
- Click next
- Create task
- View task status. After the task is created, the structure migration, full volume and incremental phases will be completed automatically.
Task performance tuning (option)
For high traffic scenarios, it is recommended to use4GAnd the above task specifications, and the relevant parameters are optimized. The optimization is based on the condition that the task has no GC problem and the peer starrocks has no competition bottleneck.
- Task details->Function list->Parameter setting
- Full migration parameter adjustment
- Incremental synchronization parameter adjustment
summary
This article briefly introduces how to use cloudcanal for data migration and synchronization from Mysql to starrocks. Dear readers, if you think it’s good, please praise, comment and forward it.
Common FAQs
Under the default task parameter configuration, if you import data too frequently, the task may be abnormal. At this time, you can adjust the parameters of fullbatchwaittimes and increbatchwaittimes mentioned above, or adjust the consolidation strategy on the server side of starrocks. The following figure shows the FAQ provided by starrocks
More wonderful
- 5 minutes to complete Mysql to Clickhouse wide table construction and synchronization – cloudcanal actual combat
- 5 minutes to complete Mysql to elasticsearch wide table construction and synchronization – cloudcanal actual combat
- Complete real-time data synchronization from Mysql to Clickhouse in 5 minutes – cloudcanal actual combat
- Complete the migration and synchronization from Mysql to elasticsearch in 5 minutes – cloudcanal actual combat
- 5 minutes to complete Mysql to MySQL heterogeneous online data migration synchronization – cloudcanal actual combat
- Data synchronization from Mysql to tidb in 5 minutes – cloudcanal actual combat
- Complete the data migration and synchronization from SQL server to MySQL in 5 minutes – cloudcanal actual combat – cloudcanal actual combat
Join the cloudcanal fan group to master first-hand news and get more benefits. Please add our little assistant wechat: suhuayue001
Cloudcanal – a free and easy-to-use enterprise data synchronization tool. Welcome to taste it.
To learn more about products, you can viewOfficial website: http://www.clougence.com
Cloudcanal community:https://www.askcug.com/