Clickhouse high availability cluster solution

Time:2021-12-7

summary

Using the replication table engine in the Clickhouse clusterReplicatedMerge*TreeCreate a local table, and the inserted data will be automatically copied between the copies of Clickhouse to achieve the effect of high availability of data

Installation configuration

RPM installation

Download the installation package directly from the official websitehttps://repo.clickhouse.tech/deb/stable/main/
Direct userpm -ivhCommand installation is OK
It is recommended to create a new user Clickhouse and join sudoers
Note that the versions of all installation packages should be the same

clickhouse-common-static– Clickhouse compiled binaries
clickhouse-server– create a Clickhouse server soft connection and install the default configuration service
clickhouse-client– create a Clickhouse client client tool soft connection and install the client configuration file
clickhouse-common-static-dbg– Clickhouse binary with debug information (not necessary)

Basic configuration

/etc/clickhouse-serverThe directory is the main configuration directory of Clickhouse

Load by default at startupconfig.xmlandusers.xml
stayconfig.dandusers.dAll additional XML files in the directory will also be loaded (you need to meet the Clickhouse configuration file syntax)

config.xmlCommon configurations are as follows:

<!--  Log -- >
<logger>
  <level>trace</level>
  <log>/var/log/clickhouse-server/clickhouse-server.log</log>
  <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
  <size>1000M</size>
  <count>10</count>
</logger>

<!--  JDBC connection port -- >
<http_port>8123</http_port>
<!--  Client connection port -- >
<tcp_port>9000</tcp_port>
<!--  Port for data exchange between servers -- >
<interserver_http_port>9009</interserver_http_port>
<!--  Native domain name -- >
<interserver_ http_ Host > the domain name needs to be used here. If replication is used later, < / interserver_ http_ host>

<!--  Listen to IP. The default format is IPv6. If it is not enabled, it needs to be modified -- >
<listen_host>0.0.0.0</listen_host>

<!--  Maximum connections, default 4096 -- >
<max_connections>64</max_connections>
<!--  Maximum concurrent queries, 100 by default -- >
<max_concurrent_queries>16</max_concurrent_queries>

<!--  Storage path -- >
<path>/var/lib/clickhouse/</path>
<tmp_path>/var/lib/clickhouse/tmp/</tmp_path>

<!--  The node mapping in metrika.xml replaces the corresponding node configuration in config.xml -- >
<remote_servers incl="clickhouse_remote_servers" />
<zookeeper incl="zookeeper_servers" optional="true" />
<macros incl="macros" optional="true" />

<!--  Reference the external configuration. The cluster configuration is referenced here. The default path of metrika.xml is / etc / metrika. XML -- >
<include_from>/etc/clickhouse-server/metrika.xml</include_from>
Start & connect

Start commandsudo systemctl start clickhouse-server
Command line connectionclickhouse-client -u root --password --port 9000(the default 9000 port can be without the port parameter)
JDBC Connection jdbc:clickhouse://10.10.10.10:8123

Cluster configuration

Clickhouse cluster information is based on manually written configuration filesmetrika.xml/ etc / metrika.xml is loaded by default. For management convenience, we refer to it in the main configuration file/etc/clickhouse-server/metrika.xml, the system table can be queried after the cluster is builtsystem.clusters, view the cluster configuration information

The cluster level of Clickhouse corresponds to the cluster level in the metrika.xml configurationmacrosNode:

  • colony《layer》 => Slice《shard》 => copyReplica (each Clickhouse instance can be regarded as a copy)

The specific cluster deployment scheme will be described in detail later

metrika.xmlThe configuration is as follows:

<yandex>
  <clickhouse_remote_servers>
    <!--  Custom cluster name -- >
    <ck_cluster>
      <!--  Slice information -- >
      <shard>
        <!--  Whether the distributed table writes data to only one replica. It is used with the replication table engine. The default is false -- >
        <internal_replication>true</internal_replication>
        <!--  The user name and password specified here can only be plaintext. If the ciphertext password is required, point the configuration to the profile in users.xml -- >
        <replica>
          <host>VM_102_21_centos</host>
          <port>9000</port>
          <user>xxx</user>
          <password>xxx</password>
        </replica>
        <replica>
          <host>VM_102_22_centos</host>
          <port>9001</port>
          <user>xxx</user>
          <password>xxx</password>
        </replica>
      </shard>
    </ck_cluster>
  </clickhouse_remote_servers>

  <!--  Replicated * mergetree uses ZK -- >
  <zookeeper_servers>
    <node index="1">
      <host>vm162centos31</host>
      <port>2181</port>
    </node>
    <node index="2">
      <host>vm162centos32</host>
      <port>2181</port>
    </node>
    <node index="3">
      <host>vm162centos33</host>
      <port>2181</port>
    </node>
  </zookeeper_servers>

  <!--  Parameters of replicated * mergetree table creation statement, specifying ZK the storage directory, using -- >
  <macros>
    <layer>ck_cluster</layer>
    <shard>shard01</shard>
    <replica>replica01</replica>
  </macros>

  ……

</yandex>

Common table engines

Distributed table distributed

The distributed engine itself does not store data, but can perform distributed queries on multiple servers.
Reads are automatically parallel. When reading, the index (if any) of the remote server table is used.
We can understand that it is equivalent to the concept of view in relational database.

Example:Engine = distributed (< cluster name >, < database name >, < table name > [, sharding_key])

The distributed table corresponds to the local table, that is, the above table< table name >Parameter. When querying distributed tables, Clickhouse will automatically query all fragments, and then summarize the results and return them

Insert data into distributed tables

Clickhouse will insert the data into each slice according to the slice weight
By default, data is written to all replicas in each shard
Or through parametersinternal_replicationConfigure each partition to write only one copy, and use the replicated * mergetree to manage the copy of data

Replicated * mergetree

ReplicatedMergeTree
ReplicatedSummingMergeTree
ReplicatedReplacingMergeTree
ReplicatedAggregatingMergeTree
ReplicatedCollapsingMergeTree
ReplicatedVersionedCollapsingMergetree
ReplicatedGraphiteMergeTree

  • Only the mergetree family engine supports the replicated prefix
  • Replicas are table level, not server level. Therefore, the server can have both replicated and non replicated tables
  • Replicas do not rely on Shards. Each fragment has its own independent copy
  • When using zookeeper for data copy, you need tometrika.xmlCluster information configured ZK in
  • Zookeeper is not required for select query. Replica does not affect the performance of select. The speed of querying replicated tables is the same as that of non replicated tables
  • By default, the insert statement returns after only one copy is written successfully. If the data is successfully written to only one copy, the server where the copy is located no longer exists, the stored data will be lost. To enable data writing to multiple copies before confirming the return, useinsert_quorumoption
  • The data block will be de duplicated. For the same data block written multiple times (data blocks with the same size and the same row in the same order), the block is written only once

Example:ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/table_name', '{replica}')

The parameters in braces aremetrika.xmlinmacrosEach node reads its own configuration information and unifies the table creation statement
The first parameter is used for the directory structure in ZK, which is layered by layer shard name
The second parameter is the replica name, which is used to identify different replicas of the same table partition. The replica names of different replicas in the same partition should be unique

Distributed cluster scheme

Scenario 1.0: mergetree + distributed

There is only one copy in each fragment. The data is stored in the local table (mergetree), and the distributed table is queried. The engine automatically queries the data from all fragments and returns it after calculation

Clickhouse high availability cluster solution

Scheme 1.0.jpg
advantage

The architecture is simple, and both stand-alone and distributed can be used

inferiority

Single point problem, high risk of data loss

Scenario 2.0: mergetree + distributed + multiple replicas

Add a replica for each node on the basis of scheme 1

Clickhouse high availability cluster solution

Scheme 2.0.jpg
advantage

On the basis of 1.0, the data security is guaranteed. If any instance or server hangs up, the cluster query service will not be affected

inferiority

If a node hangs, the lost incremental data can be completed after recovery. However, if the hard disk is completely damaged, the stock data can not be recovered, and this scheme can not use two nodes as primary and standby nodes, resulting in data disorder

Scenario 3.0: replicatedmergetree + distributed + multiple replicas

Replace the data table engine in the 2.0 scheme withReplicatedMergeTreeAnd set that only one node of the partition is written during distributed writing:internal_replicationSet to true
In the same partition, after writing the data of one node, it will be automatically synchronized to other replicas
The following figure shows how a node starts multiple Clickhouse instances

Clickhouse high availability cluster solution

Scheme 3.0.jpg
advantage

fromReplicatedMergeTreeThe table engine manages data copies (relying on zookeeper), and there is no need to worry about data synchronization and loss after the node hangs up

inferiority

The cluster configuration is complex,macrosConfiguring shards and replicas requires careful

metrika.xmlto configure

Clickhouse high availability cluster solution

2 slice 2 replica configuration.jpg
Node expansion
Clickhouse high availability cluster solution

Scheme 3.0 node extension.jpg

Single node multi instance deployment

Multiple sets of configuration files

take/etc/clickhouse-server/Under directoryconfig.xmlusers.xmlmetrika.xmlCopy to/etc/clickhouse-server/replica02/Directory
And toconfig.xmlThe directories and ports configured in are modified as follows:

<!--  Log directory -- >
<logger>
    <log>/var/log/clickhouse-server/replica02/clickhouse-server.log</log>
    <errorlog>/var/log/clickhouse-server/replica02/clickhouse-server.err.log</errorlog>
</logger>

<!--  Port -- >
<http_port>8124</http_port>
<tcp_port>9001</tcp_port>
<mysql_port>9005</mysql_port>
<interserver_http_port>9010</interserver_http_port>

<!--  Data directory -- >
<path>/var/lib/clickhouse/replica02/</path>
<tmp_path>/var/lib/clickhouse/replica02/tmp/</tmp_path>
<user_files_path>/var/lib/clickhouse/replica02/user_files/</user_files_path>

<!--  User configuration -- >
<user_directories>
    <local_directory>
        <!-- Path to folder where users created by SQL commands are stored. -->
        <path>/var/lib/clickhouse/replica02/access/</path>
    </local_directory>
</user_directories>

<include_from>/etc/clickhouse-server/replica02/metrika.xml</include_from>

<format_schema_path>/var/lib/clickhouse/replica02/format_schemas/</format_schema_path>

Multiple sets of service startup files

copy/etc/systemd/system/clickhouse-server.serviceRename toclickhouse-server-replica02.serviced
Modify the configuration loaded during startup and point to the new file. Meanwhile, the PID file should always be kept with the service name, otherwise it will not start

ExecStart=/usr/bin/clickhouse-server --config=/etc/clickhouse-server/replica02/config.xml --pid-file=/run/clickhouse-server/clickhouse-server-replica02.pid

After modification, reload the SYSTEMd servicesystem daemon-reload
Then use the command to start multiple instancessudo systemctl start clickhouse-server-replica02

Cluster validation

Log in to any node through the client to query the cluster configuration information

select * from system.clusters; 
Clickhouse high availability cluster solution

Cluster information.jpg

Cluster data write

Write distributed table

Distributed tables distribute insert data across servers, which is just a simple request forwarding. Writing multiple replicas at the same time can not ensure the consistency of replica data, and the replica data may be different for a long time
Therefore, direct writing to distributed tables is not recommended

Write local tables using the replicatedmergetree engine

The synchronization process of the data replica is managed by the replication table engine of Clickhouse
You can specify which data to write to which servers, and write directly on each shard, and you can use any shard scheme. This can be very important for the needs of complex business features
This scheme is officially recommended