Es data write tuning 1

Time:2022-1-3

Disable swapping

Most operating systems use as much memory as possible for the file system cache and switch out of unused application memory. This may cause part of the JVM heap to be swapped to disk.

For performance and node stability, this exchange is very bad and should be avoided at all costs. It may cause garbage collection to last for minutes rather than milliseconds, which may cause nodes to respond slowly or even leave the cluster.

In Linux / Unix systems, mlockall is used to lock the address space of the process in RAM to prevent elasticsearch memory from being swapped out, so as to disable swapping.

Follow these steps to enable the “bootstrap. Memory_lock” parameter.
1. Log in to the fusioninsight manager interface with the administrator account and select “cluster > name of the cluster to be operated > Service > elasticsearch > configuration > all configurations > Custom”.
2. Add a new parameter “bootstrap. Memory_lock”, set the value to “true”, click “save” to save the configuration and restart elasticsearch service.
3. Log in to any elasticsearch data node with root user and execute the following command to verify whether the modification is successful. After the command is executed, if the result displays “true”, it indicates that the modification is successful.

curl -XGET "http://ip:httpport/_nodes?filter_path=**.mlockall"

1 let the pieces be evenly distributed

For more than 5 machine nodes, in order to evenly distribute the shards on each instance, add the following parameters to set the number of shards of each index on a single instance. As shown below, the number of shards of each index on each instance is 2.
curl -XPUT "http://ip:httpport/myindex/_settings?pretty' -H 'Content-Type:application/json' -d ' { "index.routing.allocation.total_shards_per_node":"2" }'

2. Modify the index refresh time and the number of copies

default“index.refresh_interval”by“1s”That is, a new segments file is forced to be generated every second. Increasing the index refresh time can generate larger segments files, effectively reduce IO and reduce the pressure of segments merge. This configuration item can be specified when creating an index (or configured in the template).

If you only import data without real-time query, you can disable refresh (that is, set index.refresh_interval to – 1) and set “index. Number_of_replicas” to “0”. Of course, this setting will have the risk of data loss. Wait until the data is imported, and then set the parameters to appropriate values.

The command is a single index operation, as shown below. It also supports multiple indexes (index names are separated by commas) and full indexes (with * wildcards).

curl -XPUT "http://ip:httpport/myindex/_settings" -H 'Content-Type: application/json' -d'
{
    "number_of_replicas": 0,
    "refresh_interval": "180s"
}'

3. Modify the merge parameter and the number of threads

When elasticsearch writes data, refresh will generate a new segment, and segments will merge index segments according to certain policies. The frequency of merge has a certain impact on the speed of writing and query. If the frequency of merge is relatively fast, it will occupy more IO and affect the speed of writing, but at the same time, the number of segments will be relatively small, which can improve the query speed. Therefore, the setting of merge frequency needs to be weighed according to the specific business, and ensure that the writing and query are relatively fast. Elasticsearch uses tieredmergepolicy by default. You can control the frequency of merging index segments through parameters:
1. Parameters“index.merge.policy.floor_segment”, elasticsearch avoids generating very small segments. All very small segments smaller than this threshold will be merged until the size of the floor is reached. The default is 2MB.
2. Parameters“index.merge.policy.max_merge_at_once”, the maximum number of segments to be merged at a time is 10 by default.
3. Parameters“index.merge.policy.max_merged_segment”, segments larger than the size will not be merged. The default is 5GB.
4. Parameters“index.merge.policy.segment_per_tier”The default value is 10, which indicates the number of segments allowed for each tier. Note that this value should be greater than or equal to“index.merge.policy.max_merge_at_once”Value, otherwise this value will arrive before the maximum operands, and merge will be done immediately, which will cause frequent merge.
5. Parameters“ index.merge.scheduler.max_thread_count ”, the maximum number of threads that can be merged simultaneously on a single shard. It starts by defaultMath.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))Threads perform merge operation, which is applicable to SSD solid state drives. However, if the hard disk is a mechanical hard disk, IO blocking is easy to occur. Set the number of threads to 1.


In general, by adjusting the parameters“index.merge.policy.max_merge_at_once”and“index.merge.policy.segment_per_tier”To control the frequency of the merge.

Es data write tuning 1

image.png

The command to modify parameters is as follows:

curl -XPUT "http://ip:httpport/myindex-001/_settings?pretty" -H 'Content-Type: application/json' -d'
{
     "merge":{
         "scheduler":{
            "max_thread_count" : "1"
         },
         "policy":{
              "segments_per_tier" : "20",
              "max_merge_at_once": "20",
              "floor_segment" : "2m",
              "max_merged_segment" : "5g"
         }
      }
}'