Getting started with Clickhouse

Time:2021-9-16

Clickhouse introduction
CK is a column storage database, which is aimed at OLAP. OLAP features:

  • Data is not often written, even if it is written, it is written in batch. Unlike OLTP, it is written one by one
  • Most are read requests
  • There is less query concurrency, which is not suitable for high concurrency business scenarios, CK itself recommends a maximum of 100 concurrent queries per second.
  • Transaction not required

Advantages of Click

In order to enhance the compression ratio, the length of a column stored by CK is fixed, so the length information of the column is not stored

Using vector engine, vector engine, what is a vector engine?
https://www.infoq.cn/article/…

Disadvantages of Clickhouse

  • Transactions cannot be fully supported
  • Unable to modify or delete data with high throughput
  • Due to the sparsity of the index, it is not suitable to query a single record based on key

performance optimization

In order to improve the insertion performance, it is best to insert in batch. The minimum batch is 1000 rows of records. And the use of concurrent insertion can significantly improve the insertion speed.

Access interface

CK exposes two ports like es, one TCP and one http. TCP default port: 9000, HTTP default port: 8123. Generally, we do not directly interact with CK through these ports, but use some clients, which can be:

  • Command line client can link CK, perform basic CRUD operations, and import data to CK. It uses TCP ports to link CK
  • HTTP interface: like es, it can submit crud through rest according to CK its own syntax
  • jdbc driver
  • odbc driver

Input / output format

CK can read and write multiple formats as input (i.e. insert), and can spit out the specified format when outputting (i.e. select).

For example, when inserting data, specify the format of the data source as jsoneachrow

INSERT INTO UserActivity FORMAT JSONEachRow {"PageViews":5, "UserID":"4324182021466249494", "Duration":146,"Sign":-1} {"UserID":"4324182021466249494","PageViews":6,"Duration":185,"Sign":1}

When reading data, the specified format is jsoneachrow

SELECT * FROM UserActivity FORMAT JSONEachRow

It is worth noting that these formats should be parsed or generated by CK, not the final storage format of CK. CK should still be stored in its own column format. CK supports multiple formats. See the document for details
https://clickhouse.yandex/doc…

database engine

CK supports the creation of a database in CK, but the actual storage of the database is mysql. In this way, you can crud the data of the tables in the library through CK, which is a bit like the appearance in hive, but the whole database is plug-in here.

Suppose MySQL has the following data

mysql> USE test;
Database changed

mysql> CREATE TABLE `mysql_table` (
    ->   `int_id` INT NOT NULL AUTO_INCREMENT,
    ->   `float` FLOAT NOT NULL,
    ->   PRIMARY KEY (`int_id`));
Query OK, 0 rows affected (0,09 sec)

mysql> insert into mysql_table (`int_id`, `float`) VALUES (1,2);
Query OK, 1 row affected (0,00 sec)

mysql> select * from mysql_table;
+--------+-------+
| int_id | value |
+--------+-------+
|      1 |     2 |
+--------+-------+
1 row in set (0,00 sec)

Create a database in CK and link the above mysql

CREATE DATABASE mysql_db ENGINE = MySQL('localhost:3306', 'test', 'my_user', 'user_password')

Then you can perform a series of operations on the MySQL Library in CK
Getting started with Clickhouse

Table engine – mergetree family

The table engine defines which engine is used to store a table when it is created. Table engine controls the following

  • How to read and write data and where to store it
  • Supported query capabilities
  • Concurrent data access capability
  • Replica characteristics of data

Mergetree engine

When creating a table, specify the table engine configuration

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
    ...
    INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
    INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = MergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]

  • The engine stores data in partitions.
  • During data insertion, the data of different partitions will be divided into different data parts. These data parts will be merged in the background, and the data parts of different partitions will not be combined
  • A data part is composed of many indivisible minimum granulations

Some configuration examples

ENGINE MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity=8192

granule

Getting started with Clickhouse
Gruanule is a data set sorted by primary key and immediately adjacent to each other. The primary key of the first row of data of each granule is used as the data as the mark of the data set. For example, the primary key here is(CounterID, Date)。 The primary key of the first column of data sorted by the first granular isa,1, you can see multiple rows of data in one more grunle, and their primary keys can be the same.

At the same time, in order to facilitate indexing, CK will specify a mark number for each granule to facilitate actual use (it is better to use the number than the actual primary key value).

This index structure is very much like a jump table. It is also called sparse index because it does not index every row of data, but indexes the sorted data range.

For example, if we want to query counterid in (‘a ‘,’ H ‘), the CK server is based on the above structure, and the actual read data range is [0,3) and [6,8)

You can use index when creating a table_ Granularity specifies the number of row records stored between two marks, that is, the size of the granular (because there is a granular between two marks)

TTL

You can set expiration for tables and fields

Mergetree summary

Mergetree is equivalent to the superclass of the mergetree family table engine. It defines the characteristics of the data file storage of the entire mergetree family. Namely

  • There is data consolidation
  • There are sparse indexes and data structures like jump tables to store data sets.
  • You can specify a data partition

Based on this data, some sub mergetrees for different application scenarios are derived. They are respectively

  • Replaceingmergetree automatically removes the same data as the primary key
  • Summingmergetree can sum up the numeric fields of the same primary key and save them as a row, which is equivalent to pre aggregation. It can reduce storage space and improve query performance
  • Aggregatingmergetree can aggregate the data of the same primary key according to certain rules, reduce data storage and improve the performance of aggregation query. It is equivalent to pre aggregation.
  • Collapsingmergetree merges most columns with the same contents, but some columns have different values, but the data is in pairs. For example, the values of columns are 1 and – 1

Replicatedmergetree engine

The tables created in CK do not have replicate by default. In order to improve availability, replicate needs to be introduced. CK is introduced by integrating zookeeper to realize the replicate copy of data.

The various pre aggregation engines mentioned above are also supported by the corresponding replicated mergetree engine

  • ReplicatedMergeTree
  • ReplicatedSummingMergeTree
  • ReplicatedReplacingMergeTree
  • ReplicatedAggregatingMergeTree
  • ReplicatedCollapsingMergeTree
  • ReplicatedVersionedCollapsingMergeTree
  • ReplicatedGraphiteMergeTree

Table engine – log engine family

This series of table engines is aimed at the log scenario where small tables are continuously generated and the amount of data in each table is small. These engines are characterized by:

  • Data is stored on disk
  • Add data in apeend mode
  • Write is locked, and read needs to wait, that is, the query performance is not high

Table engine – external data source

When creating tables, CK also supports many external data source engines. They should look like hive, but only establish a table link. The actual storage is still the source data source. (this is to be confirmed)

These external data source table engines are:

  • Kafka
  • MySQL
  • JDBC
  • ODBC
  • HDFS

SQL syntax

Sample statement

When creating a table, you can specify to make a sample based on the hash value of a column (the reason for hash hash is to ensure uniform and random sampling). In this way, when querying, we can sample part of the data based on sample instead of processing the whole table data. For example, there are 100 people in the whole table. If you want to calculate the total score of these 100 people, you can use sample to take ten people, sum their scores and multiply them by 10. Sample is suitable for business scenarios that do not require accurate calculation and are very sensitive to calculation time.

Installation matters

Some tips

Turn off swap file in production environment

Disable the swap file for production environments.

Some tables that record the operation of the cluster

system.metrics, system.events, and system.asynchronous_metrics tables.

Installation environment configuration

CPU frequency control

In the Linux system, the CPU frequency will be reduced or increased according to the task load. These scheduling up and down processes will affect the performance of CK. Use the following configuration to maximize the CPU frequency

echo 'performance' | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

The possible configurations of Linux system frequency are as follows:
Getting started with Clickhouse

Run overallocated memory

Based on the swap disk mechanism, the Linux system can support the application system to apply for memory that exceeds the actual size of physical memory. The basic principle is to swap a part of unused data to the hard disk to make room for the data in use. In this way, it seems that there is a large amount of memory for the upper application. This behavior of allowing excessive memory application is called overcommissioning memory

There are three values that control the overcommissioning memory behavior

  • 0: The Linux kernel is free to overcommit memory (this is the default), a heuristic algorithm is applied to figure out if enough memory is available.
  • 1: The Linux kernel will always overcommit memory, and never check if enough memory is available. This increases the risk of out-of-memory situations, but also improves memory-intensive workloads.
  • 2: The Linux kernel will not overcommit memory, and only allocate as much memory as defined in overcommit_ratio.

CK needs as much memory as possible, so you need to enable the function of excess application. Modify the configuration as follows

 echo 0 | sudo tee /proc/sys/vm/overcommit_memory

Turn off transparent memory

In order to speed up processing, the huge pages operating system puts some application memory pages into the processor. This page is called Tiger pages. In order to make this process transparent, Linux enables the khugepaged kernel thread to take charge of it. This transparent automation method is called transparent hugepages. However, the automatic method will bring the risk of memory leakage. See the reference link for the specific reasons.

Therefore, CK installation expects this option to be turned off:

echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled

Try to use large network bandwidth

If it is IPv6, you need to increase the route cache

Don’t put ZK and CK together

CK will occupy as much resources as possible to ensure performance, so if it is installed with ZK, CK will affect ZK, reduce its throughput and increase its latency

Enable ZK log cleanup

By default, ZK will not delete expired snapshot and log files, which will be a time bomb over time. Therefore, it is necessary to modify ZK configuration and enable autopurge function. Yandex configuration is as follows:

ZK configure zoo.cfg

# http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=30000
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=10

maxClientCnxns=2000

maxSessionTimeout=60000000
# the directory where the snapshot is stored.
dataDir=/opt/zookeeper/{{ cluster['name'] }}/data
# Place the dataLogDir to a separate physical disc for better performance
dataLogDir=/opt/zookeeper/{{ cluster['name'] }}/logs

autopurge.snapRetainCount=10
autopurge.purgeInterval=1


# To avoid seeks ZooKeeper allocates space in the transaction log file in
# blocks of preAllocSize kilobytes. The default block size is 64M. One reason
# for changing the size of the blocks is to reduce the block size if snapshots
# are taken more often. (Also, see snapCount).
preAllocSize=131072

# Clients can submit requests faster than ZooKeeper can process them,
# especially if there are a lot of clients. To prevent ZooKeeper from running
# out of memory due to queued requests, ZooKeeper will throttle clients so that
# there is no more than globalOutstandingLimit outstanding requests in the
# system. The default limit is 1,000.ZooKeeper logs transactions to a
# transaction log. After snapCount transactions are written to a log file a
# snapshot is started and a new transaction log file is started. The default
# snapCount is 10,000.
snapCount=3000000

# If this option is defined, requests will be will logged to a trace file named
# traceFile.year.month.day.
#traceFile=

# Leader accepts client connections. Default value is "yes". The leader machine
# coordinates updates. For higher update throughput at thes slight expense of
# read throughput the leader can be configured to not accept clients and focus
# on coordination.
leaderServes=yes

standaloneEnabled=false
dynamicConfigFile=/etc/zookeeper-{{ cluster['name'] }}/conf/zoo.cfg.dynamic

Corresponding JVM parameters

NAME=zookeeper-{{ cluster['name'] }}
ZOOCFGDIR=/etc/$NAME/conf

# TODO this is really ugly
# How to find out, which jars are needed?
# seems, that log4j requires the log4j.properties file to be in the classpath
CLASSPATH="$ZOOCFGDIR:/usr/build/classes:/usr/build/lib/*.jar:/usr/share/zookeeper/zookeeper-3.5.1-metrika.jar:/usr/share/zookeeper/slf4j-log4j12-1.7.5.jar:/usr/share/zookeeper/slf4j-api-1.7.5.jar:/usr/share/zookeeper/servlet-api-2.5-20081211.jar:/usr/share/zookeeper/netty-3.7.0.Final.jar:/usr/share/zookeeper/log4j-1.2.16.jar:/usr/share/zookeeper/jline-2.11.jar:/usr/share/zookeeper/jetty-util-6.1.26.jar:/usr/share/zookeeper/jetty-6.1.26.jar:/usr/share/zookeeper/javacc.jar:/usr/share/zookeeper/jackson-mapper-asl-1.9.11.jar:/usr/share/zookeeper/jackson-core-asl-1.9.11.jar:/usr/share/zookeeper/commons-cli-1.2.jar:/usr/src/java/lib/*.jar:/usr/etc/zookeeper"

ZOOCFG="$ZOOCFGDIR/zoo.cfg"
ZOO_LOG_DIR=/var/log/$NAME
USER=zookeeper
GROUP=zookeeper
PIDDIR=/var/run/$NAME
PIDFILE=$PIDDIR/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
JAVA=/usr/bin/java
ZOOMAIN="org.apache.zookeeper.server.quorum.QuorumPeerMain"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
JMXLOCALONLY=false
JAVA_OPTS="-Xms{{ cluster.get('xms','128M') }} \
    -Xmx{{ cluster.get('xmx','1G') }} \
    -Xloggc:/var/log/$NAME/zookeeper-gc.log \
    -XX:+UseGCLogFileRotation \
    -XX:NumberOfGCLogFiles=16 \
    -XX:GCLogFileSize=16M \
    -verbose:gc \
    -XX:+PrintGCTimeStamps \
    -XX:+PrintGCDateStamps \
    -XX:+PrintGCDetails
    -XX:+PrintTenuringDistribution \
    -XX:+PrintGCApplicationStoppedTime \
    -XX:+PrintGCApplicationConcurrentTime \
    -XX:+PrintSafepointStatistics \
    -XX:+UseParNewGC \
    -XX:+UseConcMarkSweepGC \
-XX:+CMSParallelRemarkEnabled"

Data backup

In addition to the data stored in CK, a copy can be kept in HDFS to prevent CK data from being lost and unable to be recovered.

configuration file

The default configuration file for CK is/etc/clickhouse-server/config.xml, you can specify all server configurations in it.

Of course, you can separate various configurations, such as user configuration and quota configuration, and put them in a separate file. The paths of other files are

 /etc/clickhouse-server/config.d

CK will eventually combine all the configurations to generate a complete configuration file-preprocessed.xml

For each separate configuration, you can overwrite or delete the same configuration in the main configuration. Just use the replace or remove attribute, such as

<query_masking_rules>
    <rule>
        <name>hide SSN</name>
        <regexp>\b\d{3}-\d{2}-\d{4}\b</regexp>
        <replace>000-00-0000</replace>
    </rule>
</query_masking_rules>

At the same time, CK can also use ZK as its own configuration source, that is, the configuration in ZK will be used for the generation of the final configuration file.

By default:
Users, access rights, profiles of settings, quotes these settings are in users.xml

Some best practices

Some best configuration practices:

1. When writing, do not use the distribution table for fear of data inconsistency
2. Set background_ pool_ Size, increase the speed of merge, because merge threads use this thread pool
3. Set max_ memory_ Usage and Max_ memory_ usage_ for_ all_ Queries, which limits the size of physical memory used by CK. Because the memory used is too large, the operating system will kill the CK process
4. Set max_ bytes_ before_ external_ Sort and Max_ bytes_ before_ external_ group_ By, so that the aggregated sort and group will not fail when large memory is required and the memory exceeds the above limit. You can use the hard disk instead

Clickhouse introduction

CK is a column storage database, which is aimed at OLAP. OLAP features:

  • Data is not often written, even if it is written, it is written in batch. Unlike OLTP, it is written one by one
  • Most are read requests
  • There is less query concurrency, which is not suitable for high concurrency business scenarios, CK itself recommends a maximum of 100 concurrent queries per second.
  • Transaction not required

Advantages of Click

In order to enhance the compression ratio, the length of a column stored by CK is fixed, so the length information of the column is not stored

Using vector engine, vector engine, what is a vector engine?
https://www.infoq.cn/article/…

Disadvantages of Clickhouse

  • Transactions cannot be fully supported
  • Unable to modify or delete data with high throughput
  • Due to the sparsity of the index, it is not suitable to query a single record based on key

performance optimization

In order to improve the insertion performance, it is best to insert in batch. The minimum batch is 1000 rows of records. And the use of concurrent insertion can significantly improve the insertion speed.

Access interface

CK exposes two ports like es, one TCP and one http. TCP default port: 9000, HTTP default port: 8123. Generally, we do not directly interact with CK through these ports, but use some clients, which can be:

  • Command line client can link CK, perform basic CRUD operations, and import data to CK. It uses TCP ports to link CK
  • HTTP interface: like es, it can submit crud through rest according to CK its own syntax
  • jdbc driver
  • odbc driver

Input / output format

CK can read and write multiple formats as input (i.e. insert), and can spit out the specified format when outputting (i.e. select).

For example, when inserting data, specify the format of the data source as jsoneachrow

INSERT INTO UserActivity FORMAT JSONEachRow {"PageViews":5, "UserID":"4324182021466249494", "Duration":146,"Sign":-1} {"UserID":"4324182021466249494","PageViews":6,"Duration":185,"Sign":1}

When reading data, the specified format is jsoneachrow

SELECT * FROM UserActivity FORMAT JSONEachRow

It is worth noting that these formats should be parsed or generated by CK, not the final storage format of CK. CK should still be stored in its own column format. CK supports multiple formats. See the document for details
https://clickhouse.yandex/doc…

database engine

CK supports the creation of a database in CK, but the actual storage of the database is mysql. In this way, you can crud the data of the tables in the library through CK, which is a bit like the appearance in hive, but the whole database is plug-in here.

Suppose MySQL has the following data

mysql> USE test;
Database changed

mysql> CREATE TABLE `mysql_table` (
    ->   `int_id` INT NOT NULL AUTO_INCREMENT,
    ->   `float` FLOAT NOT NULL,
    ->   PRIMARY KEY (`int_id`));
Query OK, 0 rows affected (0,09 sec)

mysql> insert into mysql_table (`int_id`, `float`) VALUES (1,2);
Query OK, 1 row affected (0,00 sec)

mysql> select * from mysql_table;
+--------+-------+
| int_id | value |
+--------+-------+
|      1 |     2 |
+--------+-------+
1 row in set (0,00 sec)

Create a database in CK and link the above mysql

CREATE DATABASE mysql_db ENGINE = MySQL('localhost:3306', 'test', 'my_user', 'user_password')

Then you can perform a series of operations on the MySQL Library in CK

Table engine – mergetree family

The table engine defines which engine is used to store a table when it is created. Table engine controls the following

  • How to read and write data and where to store it
  • Supported query capabilities
  • Concurrent data access capability
  • Replica characteristics of data

Mergetree engine

When creating a table, specify the table engine configuration

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
    ...
    INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
    INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = MergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]

  • The engine stores data in partitions.
  • During data insertion, the data of different partitions will be divided into different data parts. These data parts will be merged in the background, and the data parts of different partitions will not be combined
  • A data part is composed of many indivisible minimum granulations

Some configuration examples

ENGINE MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity=8192

granule


Gruanule is a data set sorted by primary key and immediately adjacent to each other. The primary key of the first row of data of each granule is used as the data as the mark of the data set. For example, the primary key here is(CounterID, Date)。 The primary key of the first column of data sorted by the first granular isa,1, you can see multiple rows of data in one more grunle, and their primary keys can be the same.

At the same time, in order to facilitate indexing, CK will specify a mark number for each granule to facilitate actual use (it is better to use the number than the actual primary key value).

This index structure is very much like a jump table. It is also called sparse index because it does not index every row of data, but indexes the sorted data range.

For example, if we want to query counterid in (‘a ‘,’ H ‘), the CK server is based on the above structure, and the actual read data range is [0,3) and [6,8)

You can use index when creating a table_ Granularity specifies the number of row records stored between two marks, that is, the size of the granular (because there is a granular between two marks)

TTL

You can set expiration for tables and fields

Mergetree summary

Mergetree is equivalent to the superclass of the mergetree family table engine. It defines the characteristics of the data file storage of the entire mergetree family. Namely

  • There is data consolidation
  • There are sparse indexes and data structures like jump tables to store data sets.
  • You can specify a data partition

Based on this data, some sub mergetrees for different application scenarios are derived. They are respectively

  • Replaceingmergetree automatically removes the same data as the primary key
  • Summingmergetree can sum up the numeric fields of the same primary key and save them as a row, which is equivalent to pre aggregation. It can reduce storage space and improve query performance
  • Aggregatingmergetree can aggregate the data of the same primary key according to certain rules, reduce data storage and improve the performance of aggregation query. It is equivalent to pre aggregation.
  • Collapsingmergetree merges most columns with the same contents, but some columns have different values, but the data is in pairs. For example, the values of columns are 1 and – 1

Replicatedmergetree engine

The tables created in CK do not have replicate by default. In order to improve availability, replicate needs to be introduced. CK is introduced by integrating zookeeper to realize the replicate copy of data.

The various pre aggregation engines mentioned above are also supported by the corresponding replicated mergetree engine

  • ReplicatedMergeTree
  • ReplicatedSummingMergeTree
  • ReplicatedReplacingMergeTree
  • ReplicatedAggregatingMergeTree
  • ReplicatedCollapsingMergeTree
  • ReplicatedVersionedCollapsingMergeTree
  • ReplicatedGraphiteMergeTree

Table engine – log engine family

This series of table engines is aimed at the log scenario where small tables are continuously generated and the amount of data in each table is small. These engines are characterized by:

  • Data is stored on disk
  • Add data in apeend mode
  • Write is locked, and read needs to wait, that is, the query performance is not high

Table engine – external data source

When creating tables, CK also supports many external data source engines. They should look like hive, but only establish a table link. The actual storage is still the source data source. (this is to be confirmed)

These external data source table engines are:

  • Kafka
  • MySQL
  • JDBC
  • ODBC
  • HDFS

SQL syntax

Sample statement

When creating a table, you can specify to make a sample based on the hash value of a column (the reason for hash hash is to ensure uniform and random sampling). In this way, when querying, we can sample part of the data based on sample instead of processing the whole table data. For example, there are 100 people in the whole table. If you want to calculate the total score of these 100 people, you can use sample to take ten people, sum their scores and multiply them by 10. Sample is suitable for business scenarios that do not require accurate calculation and are very sensitive to calculation time.

Installation matters

Some tips

Turn off swap file in production environment

Disable the swap file for production environments.

Some tables that record the operation of the cluster

system.metrics, system.events, and system.asynchronous_metrics tables.

Installation environment configuration

CPU frequency control

In the Linux system, the CPU frequency will be reduced or increased according to the task load. These scheduling up and down processes will affect the performance of CK. Use the following configuration to maximize the CPU frequency

echo 'performance' | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

The possible configurations of Linux system frequency are as follows:

Run overallocated memory

Based on the swap disk mechanism, the Linux system can support the application system to apply for memory that exceeds the actual size of physical memory. The basic principle is to swap a part of unused data to the hard disk to make room for the data in use. In this way, it seems that there is a large amount of memory for the upper application. This behavior of allowing excessive memory application is called overcommissioning memory

There are three values that control the overcommissioning memory behavior

  • 0: The Linux kernel is free to overcommit memory (this is the default), a heuristic algorithm is applied to figure out if enough memory is available.
  • 1: The Linux kernel will always overcommit memory, and never check if enough memory is available. This increases the risk of out-of-memory situations, but also improves memory-intensive workloads.
  • 2: The Linux kernel will not overcommit memory, and only allocate as much memory as defined in overcommit_ratio.

CK needs as much memory as possible, so you need to enable the function of excess application. Modify the configuration as follows

 echo 0 | sudo tee /proc/sys/vm/overcommit_memory

Turn off transparent memory

In order to speed up processing, the huge pages operating system puts some application memory pages into the processor. This page is called Tiger pages. In order to make this process transparent, Linux enables the khugepaged kernel thread to take charge of it. This transparent automation method is called transparent hugepages. However, the automatic method will bring the risk of memory leakage. See the reference link for the specific reasons.

Therefore, CK installation expects this option to be turned off:

echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled

Try to use large network bandwidth

If it is IPv6, you need to increase the route cache

Don’t put ZK and CK together

CK will occupy as much resources as possible to ensure performance, so if it is installed with ZK, CK will affect ZK, reduce its throughput and increase its latency

Enable ZK log cleanup

By default, ZK will not delete expired snapshot and log files, which will be a time bomb over time. Therefore, it is necessary to modify ZK configuration and enable autopurge function. Yandex configuration is as follows:

ZK configure zoo.cfg

# http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=30000
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=10

maxClientCnxns=2000

maxSessionTimeout=60000000
# the directory where the snapshot is stored.
dataDir=/opt/zookeeper/{{ cluster['name'] }}/data
# Place the dataLogDir to a separate physical disc for better performance
dataLogDir=/opt/zookeeper/{{ cluster['name'] }}/logs

autopurge.snapRetainCount=10
autopurge.purgeInterval=1


# To avoid seeks ZooKeeper allocates space in the transaction log file in
# blocks of preAllocSize kilobytes. The default block size is 64M. One reason
# for changing the size of the blocks is to reduce the block size if snapshots
# are taken more often. (Also, see snapCount).
preAllocSize=131072

# Clients can submit requests faster than ZooKeeper can process them,
# especially if there are a lot of clients. To prevent ZooKeeper from running
# out of memory due to queued requests, ZooKeeper will throttle clients so that
# there is no more than globalOutstandingLimit outstanding requests in the
# system. The default limit is 1,000.ZooKeeper logs transactions to a
# transaction log. After snapCount transactions are written to a log file a
# snapshot is started and a new transaction log file is started. The default
# snapCount is 10,000.
snapCount=3000000

# If this option is defined, requests will be will logged to a trace file named
# traceFile.year.month.day.
#traceFile=

# Leader accepts client connections. Default value is "yes". The leader machine
# coordinates updates. For higher update throughput at thes slight expense of
# read throughput the leader can be configured to not accept clients and focus
# on coordination.
leaderServes=yes

standaloneEnabled=false
dynamicConfigFile=/etc/zookeeper-{{ cluster['name'] }}/conf/zoo.cfg.dynamic

Corresponding JVM parameters

NAME=zookeeper-{{ cluster['name'] }}
ZOOCFGDIR=/etc/$NAME/conf

# TODO this is really ugly
# How to find out, which jars are needed?
# seems, that log4j requires the log4j.properties file to be in the classpath
CLASSPATH="$ZOOCFGDIR:/usr/build/classes:/usr/build/lib/*.jar:/usr/share/zookeeper/zookeeper-3.5.1-metrika.jar:/usr/share/zookeeper/slf4j-log4j12-1.7.5.jar:/usr/share/zookeeper/slf4j-api-1.7.5.jar:/usr/share/zookeeper/servlet-api-2.5-20081211.jar:/usr/share/zookeeper/netty-3.7.0.Final.jar:/usr/share/zookeeper/log4j-1.2.16.jar:/usr/share/zookeeper/jline-2.11.jar:/usr/share/zookeeper/jetty-util-6.1.26.jar:/usr/share/zookeeper/jetty-6.1.26.jar:/usr/share/zookeeper/javacc.jar:/usr/share/zookeeper/jackson-mapper-asl-1.9.11.jar:/usr/share/zookeeper/jackson-core-asl-1.9.11.jar:/usr/share/zookeeper/commons-cli-1.2.jar:/usr/src/java/lib/*.jar:/usr/etc/zookeeper"

ZOOCFG="$ZOOCFGDIR/zoo.cfg"
ZOO_LOG_DIR=/var/log/$NAME
USER=zookeeper
GROUP=zookeeper
PIDDIR=/var/run/$NAME
PIDFILE=$PIDDIR/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
JAVA=/usr/bin/java
ZOOMAIN="org.apache.zookeeper.server.quorum.QuorumPeerMain"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
JMXLOCALONLY=false
JAVA_OPTS="-Xms{{ cluster.get('xms','128M') }} \
    -Xmx{{ cluster.get('xmx','1G') }} \
    -Xloggc:/var/log/$NAME/zookeeper-gc.log \
    -XX:+UseGCLogFileRotation \
    -XX:NumberOfGCLogFiles=16 \
    -XX:GCLogFileSize=16M \
    -verbose:gc \
    -XX:+PrintGCTimeStamps \
    -XX:+PrintGCDateStamps \
    -XX:+PrintGCDetails
    -XX:+PrintTenuringDistribution \
    -XX:+PrintGCApplicationStoppedTime \
    -XX:+PrintGCApplicationConcurrentTime \
    -XX:+PrintSafepointStatistics \
    -XX:+UseParNewGC \
    -XX:+UseConcMarkSweepGC \
-XX:+CMSParallelRemarkEnabled"

Data backup

In addition to the data stored in CK, a copy can be kept in HDFS to prevent CK data from being lost and unable to be recovered.

configuration file

The default configuration file for CK is/etc/clickhouse-server/config.xml, you can specify all server configurations in it.

Of course, you can separate various configurations, such as user configuration and quota configuration, and put them in a separate file. The paths of other files are

 /etc/clickhouse-server/config.d

CK will eventually combine all the configurations to generate a complete configuration file-preprocessed.xml

For each separate configuration, you can overwrite or delete the same configuration in the main configuration. Just use the replace or remove attribute, such as

<query_masking_rules>
    <rule>
        <name>hide SSN</name>
        <regexp>\b\d{3}-\d{2}-\d{4}\b</regexp>
        <replace>000-00-0000</replace>
    </rule>
</query_masking_rules>

At the same time, CK can also use ZK as its own configuration source, that is, the configuration in ZK will be used for the generation of the final configuration file.

By default:
Users, access rights, profiles of settings, quotes these settings are in users.xml

Some best practices

Some best configuration practices:

1. When writing, do not use the distribution table for fear of data inconsistency
2. Set background_ pool_ Size, increase the speed of merge, because merge threads use this thread pool
3. Set max_ memory_ Usage and Max_ memory_ usage_ for_ all_ Queries, which limits the size of physical memory used by CK. Because the memory used is too large, the operating system will kill the CK process
4. Set max_ bytes_ before_ external_ Sort and Max_ bytes_ before_ external_ group_ By, so that the aggregated sort and group will not fail when large memory is required and the memory exceeds the above limit. You can use the hard disk instead

Some pit treatments:

1. Too many parts (304). Merges are processing significantly lower than inserts. The problem is that the insertion speed is too ordinary and exceeds the speed of the background merge. The solution is to increase the background_ pool_ Size and reduce the insertion speed. The official recommendation is "insert request no more than once per second". In fact, the write impact per second should not exceed one file. If the written data involves multiple partition files, this problem is likely to occur. Therefore, the setting of partitions must be reasonable
2.DB::NetException: Connection reset by peer, while reading from socket xxx 。 Most likely, Max is not configured_ memory_ Usage and Max_ memory_ usage_ for_ all_ Queries, resulting in memory overrun, and CK server is killed by the operating system
3.Memory limit (for query) exceeded:would use 9.37 GiB (attempt to allocate chunk of 301989888 bytes), maximum: 9.31 GiB 。 This is because we set the memory usage of CK server online. Those requests that exceed the limit are killed by CK, but CK itself does not hang. This time we need to increase max_ bytes_ before_ external_ Sort and Max_ bytes_ before_ external_ group_ By configuration to use the hard disk
4. CK's replica and fragmentation depend on ZK, so ZK is a big performance bottleneck. You need to have a good understanding and configuration of ZK, and even enable multiple ZK clusters to support CK clusters
5. Both ZK and CK recommend using SSD to improve performance

Corresponding article:https://mp.weixin.qq.com/s/eg…

reference material

https://clickhouse.yandex/docs/en/operations/tips/
http://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/

https://blog.nelhage.com/post…

https://wiki.archlinux.org/index.php/CPU_frequency_scaling

Some pit treatments:

1. Too many parts (304). Merges are processing significantly lower than inserts. The problem is that the insertion speed is too ordinary and exceeds the speed of the background merge. The solution is to increase the background_ pool_ Size and reduce the insertion speed. The official recommendation is "insert request no more than once per second". In fact, the write impact per second should not exceed one file. If the written data involves multiple partition files, this problem is likely to occur. Therefore, the setting of partitions must be reasonable
2.DB::NetException: Connection reset by peer, while reading from socket xxx 。 Most likely, Max is not configured_ memory_ Usage and Max_ memory_ usage_ for_ all_ Queries, resulting in memory overrun, and CK server is killed by the operating system
3.Memory limit (for query) exceeded:would use 9.37 GiB (attempt to allocate chunk of 301989888 bytes), maximum: 9.31 GiB 。 This is because we set the memory usage of CK server online. Those requests that exceed the limit are killed by CK, but CK itself does not hang. This time we need to increase max_ bytes_ before_ external_ Sort and Max_ bytes_ before_ external_ group_ By configuration to use the hard disk
4. CK's replica and fragmentation depend on ZK, so ZK is a big performance bottleneck. You need to have a good understanding and configuration of ZK, and even enable multiple ZK clusters to support CK clusters
5. Both ZK and CK recommend using SSD to improve performance

Corresponding article:https://mp.weixin.qq.com/s/eg…

reference material

https://clickhouse.yandex/docs/en/operations/tips/
http://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/

https://blog.nelhage.com/post…

https://wiki.archlinux.org/index.php/CPU_frequency_scaling

Welcome to my personal official account, “northwest northwest UP”, recording code life, industry thinking, technology review.

Recommended Today

Detailed steps for installing Perl and Komodo IDE for windows

Perl official website: https://www.perl.org/Perl document: https://perldoc.perl.org/Download address: https://www.perl.org/get.html The installation package of Perl Windows version is divided into activestate Perl and strawberry Perl. For the difference between the two, see: http://www.zzvips.com/article/202134.htm Note: the download speed of activestate Perl is slow. You may need KX to surf the Internet I have uploaded all the versions of […]