Elasticsearch series – production cluster deployment (Part 2)



This article continues to explain the details of elasticsearch cluster deployment

Cluster restart problem

If our elasticsearch cluster does some off-line maintenance operations, such as expanding the disk and upgrading the version, it needs to start the cluster. When there are a large number of nodes, it may take a long time to start from the first node to the last node. Sometimes, some nodes may fail to start due to faults. Check the problem After repairing the fault, you can join the cluster. What will the cluster do at this time?

Suppose a cluster of 10 nodes has one shard for each node. After upgrading, restart the node. As a result, three nodes fail to start due to failure, and it takes time to troubleshoot, as shown in the figure below:

Elasticsearch series - production cluster deployment (Part 2)

The whole process steps are as follows:

  1. The cluster has completed the master election (node6). When the master finds that the shards contained in node1, node2 and node3 that are not joined in the cluster are missing, it immediately issues the shard recovery instruction.
  2. Upgrade one replica shard of the seven online nodes to primary shard, and copy enough replica shards from these primary Shards.
  3. Perform the shard rebalance operation.
  4. The three failed nodes have been removed and joined the cluster after successful startup.
  5. When the three nodes find that their shards are already on other nodes in the cluster, they delete the local shard data.
  6. The master finds that there is no shard data in the three new nodes, and re executes the shard rebalance operation.

In this process, you can find that you have done four more IO operations: shard copy, shard first move, shard local delete, shard move again. This creates a lot of IO pressure out of thin air. If the data volume is TB level, it will be time-consuming and laborious.

The reason for this kind of problem is that the interval of node startup cannot be determined, and the more nodes there are, the easier this problem occurs. If you can set how many nodes the cluster will wait for to start, and then decide whether to move the shard, the IO pressure will be much less.

To solve this problem, we have the following parameters:

  • gateway.recover_ after_ Nodes: how many nodes must the cluster have before shard recovery can be started.
  • gateway.expected_ Nodes: how many nodes should a cluster have
  • gateway.recover_ after_ Time: shard recovery time after cluster startup

As in the case above, we can set it as follows:

gateway.recover_after_nodes: 8
gateway.expected_nodes: 10
gateway.recover_after_time: 5m

The meaning of these three parameters: there are 10 nodes in the cluster. Only when 8 nodes join the cluster can shard recovery be performed. If all 10 nodes are not started successfully, the longest waiting time is 5 minutes.

The values of these parameters can be set according to the actual cluster size, and can only be set in theelasticsearch.ymlFile settings, no dynamic modification of the entry.

If the above parameters are set reasonably, there is no shard movement when the cluster is started. In this way, when the cluster is started, it can be changed from a few hours to a few seconds.

JVM and thread pool settings

When it comes to JVM tuning, everyone feels itchy. Hundreds of JVM parameters may have turned on the right button. From then on, ES has embarked on the road of high performance and high throughput. The reality may be that we think too much. Most of the parameters of es have been proved repeatedly. Basically, we don’t have to worry too much.


The default garbage collector used by elasticsearch is CMS.

## GC configuration

CMS collector is a parallel collector, which can work concurrently with the application worker thread to minimize the service pause time during garbage collection.

CMS will still have two pause phases, and there will also be some problems when recycling particularly large heaps. Although there are some disadvantages, CMS is still the best garbage collector for software requiring low latency request response, so the official recommendation is to use CMS garbage collector.

There is a new kind of garbage collector called G1. G1 recycler can provide less pause time than CMS, and has better performance for big heap. It will divide the heap into multiple regions, and then automatically predict which region will have the most reclaimable space. By recycling those regions, you can minimize the pause time and recycle for large heaps.

It sounds good, but G1 is still a relatively young garbage collector, and some new bugs are often found, which may cause the JVM to hang up. For the sake of stability, don’t use G1 for the time being. When G1 is mature, use it after ES officially recommends it.

Thread pool

When we develop a Java application system, a common way to tune the system is to adjust the thread pool. However, in ES, the default ThreadPool setting is very reasonable. For all threadpools, except the thread pool for search, the number of threads is set as much as the CPU core. If we have eight CPU cores, we can run eight threads in parallel. For most thread pools, allocating 8 threads is the most reasonable number.

Search will have a larger ThreadPool, and the number of threads is generally configured as: CPU core * 3 / 2 + 1.

Elasticsearch’s thread pool is divided into two types: the thread that receives requests and the thread that processes disk IO operations. The former is managed by es, and the latter is managed by Lucene. There will be cooperation between them. The thread of es will not block because of IO operations, so the thread setting of ES is the same as or slightly larger than the number of CPU cores.

The computing power of the server is very limited. Too many thread pools lead to frequent context switching and more resource consuming. If the ThreadPool size is set to 50100 or even 500, the CPU resource utilization will be very low and the performance will be degraded.

Just remember: use the default thread pool. If you really want to modify it, the number of CPU cores will prevail.

Heap memory setup best practices

The default JVM heap memory size of elasticsearch is 2G. In R & D environment, I will change it to 512MB, but 2GB is a little less in production environment.

In the config / jvm.options file, you can see the settings of heap

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space


Distribution rules

Elasticsearch uses memory mainly in two large families: JVM heap and Lucene, the former ES is used to store many data structures to provide faster operation performance, the latter uses OS cache cache index file, including inverted index, forward index, and OS cache memory adequacy, which directly affects the performance of query retrieval.

The general allocation rule is: the JVM heap takes up less than half of the memory, and the rest is used by Lucene.

If the total memory of a single machine is 64GB, the heap top grid memory is allocated to 32GB. Because under 32GB of memory, the JVM will use compressed oops to solve the problem that the object pointer consumes too much space. After 32GB of memory is exceeded, the compressed oops function of the JVM will be turned off. In this way, only 64 bit object pointers can be used, which will consume more space, and the excessive object pointers will be on the CPU, When moving data between main memory and LLC, L1 and other multi-level caches, it consumes more bandwidth. The final result may be that the effect of 50GB memory is the same as 32GB, wasting more than ten GB of memory.

The object pointer compression technology of JVM is involved here. If you are interested, you can learn about it separately.

If the total memory of a single machine is less than 64GB, generally, heap can be allocated as half of the total memory, depending on the estimated amount of data.

If you use a super machine with 1TB of memory, the official website does not recommend that you use such a powerful machine. It is recommended that you allocate 4-32gb of memory to heap, and use all the rest for OS cache. In this way, all the data is cached in memory, and there is no need to drop the disk for query, and the performance will be greatly improved.

Best practice recommendations

  1. Set the minimum and maximum of heap to the same size.
  2. The larger the elasticsearch JVM heap is set, the more memory will be used for caching. However, too large JVM heap may lead to long GC pause.
  3. The maximum value of JVM heap size should not exceed 50% of physical memory in order to leave enough memory for Lucene’s file system cache.
  4. The JVM heap size should not exceed 32GB, otherwise the JVM cannot enable compressed oops, compress the object pointer, and make sure that there are errors in the log[node-1] heap size [1007.3mb], compressed ordinary object pointers [true]The words appear.
  5. Best practice data: heap size is set to be less than zero based compressed Ooops, which is 26gb, but sometimes it can be 30GB. Open the corresponding interface through – XX: + unlockdiagnosticvmoptions – XX: + printcompressed oopsmode, and confirm that there isheap address: 0x00000000e0000000, size: 27648 MB, Compressed Oops mode: 32-bitWords, not wordsheap address: 0x00000000f4000000, size: 28672 MB, Compressed Oops with base: 0x00000000f3ff0000word.

Swap problem

The service deployed elasticsearch should be shut down to swap as much as possible. If the memory is cached on the disk, the query efficiency will be reduced from microsecond level to millisecond level, which will cause the hidden trouble of sharp performance degradation.

Closing method:

  1. Linux system executionswapoff -aClose swap, or configure it in the / etc / fstab file.
  2. Elasticsearch.yml can be set as follows:bootstrap.mlockall: trueLock your memory from swap to disk.

Using commandsGET _nodes?filter_path=**.mlockallYou can check whether mlockall is turned on
Response information:

  "nodes": {
    "A1s1uus7TpuDSiT4xFLOoQ": {
      "process": {
        "mlockall": true

Some problems in launching elasticsearch

  1. The problem of root starting instance

If you start the elasticsearch instance with the root user, you will get the following error prompt:

org.elasticsearch.bootstrap.StartupException: java.lang.RuntimeException: can not run elasticsearch as root
    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:140) ~[elasticsearch-6.3.1.jar:6.3.1]
    at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:127) ~[elasticsearch-6.3.1.jar:6.3.1]
    at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.3.1.jar:6.3.1]
    at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.3.1.jar:6.3.1]
    at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.3.1.jar:6.3.1]
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) ~[elasticsearch-6.3.1.jar:6.3.1]
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:86) ~[elasticsearch-6.3.1.jar:6.3.1]
Caused by: java.lang.RuntimeException: can not run elasticsearch as root
    at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:104) ~[elasticsearch-6.3.1.jar:6.3.1]
    at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:171) ~[elasticsearch-6.3.1.jar:6.3.1]
    at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.3.1.jar:6.3.1]
    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.3.1.jar:6.3.1]
    ... 6 more

Without it, create a user, which is specially used to start elastic search, such as esuser. The esuser account is assigned to the corresponding system directory and data storage directory.

  1. When starting, it prompts elasticsearch process is too low, and it cannot be started successfully

Complete message:

max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
memory locking requested for elasticsearch process but memory is not locked

Solution: set system parameters, esuser in the command line is the established Linux user.

[[email protected] bin]# vi /etc/security/limits.conf

#Add at the end of the file
esuser hard nofile 65536
esuser soft nofile 65536
esuser soft memlock unlimited
esuser hard memlock unlimited

After setting, you can view the result through the command:

#Request command
GET _nodes/stats/process?filter_path=**.max_file_descriptors

#Response results
  "nodes": {
    "A1s1uus7TpuDSiT4xFLOoQ": {
      "process": {
        "max_file_descriptors": 65536
  1. Prompt vm.max_ map_ Count [65530] is too low error, unable to start instance

Complete message:

max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

Solution: addvm.max_map_countConfiguration item

Temporary settings:sysctl -w vm.max_map_count=262144

Permanent modification: modificationvim /etc/sysctl.confFiles, addingvm.max_map_countset up

[[email protected] bin]# vim /etc/sysctl.conf

#Add at the end of the file

#Execute the order
[[email protected] bin]# sysctl -p

Elasticsearch instance start stop

The instance usually starts in the background and executes the command in the bin directory of ES

[[email protected] bin]$ nohup ./elasticsearch &
[1] 15544
[ [email protected]  Bin] $nohup: ignore input and append output to "nohup. Out"

This elasticsearch has no stop parameter and is used when stoppingkill pidOrders.

[[email protected] bin]$ jps | grep Elasticsearch
15544 Elasticsearch
[[email protected] bin]$ kill -SIGTERM 15544

Send a sigtrem signal to elasticsearch process to close the instance gracefully.


Following the content of the previous part, this article explains the problems that should be paid attention to when the cluster is restarted, the best practices of JVM heap settings, and the solutions to the common problems when the elasticsearch instance is started. Finally, the command to gracefully close elasticsearch is given.

Focus on Java high concurrency, distributed architecture, more technical dry cargo sharing and experience, please pay attention to official account: Java architecture community
You can scan the QR code on the left to add friends and invite you to join the wechat group of Java architecture community to discuss technology
Elasticsearch series - production cluster deployment (Part 2)

Recommended Today

Looking for frustration 1.0

I believe you have a basic understanding of trust in yesterday’s article. Today we will give a complete introduction to trust. Why choose rust It’s a language that gives everyone the ability to build reliable and efficient software. You can’t write unsafe code here (unsafe block is not in the scope of discussion). Most of […]