Mongodb splitchunk caused routing table refresh, resulting in slow response

Time:2020-11-27

Mongodb splitchunk caused routing table refresh, resulting in slow response

Mongodb sharding instance fromVersion 3.4Upgrade toVersion 4.0In the future, the performance of the insertion was significantly reduced, and a large number ofinsertRequest slow log:

2020-08-19T16:40:46.563+0800 I COMMAND [conn1528] command sdb.sColl command: insert { insert: "sColl", xxx} ... locks: {Global: { acquireCount: { r: 6, w: 2 } }, Database: { acquireCount: { r: 2, w: 2 } }, Collection: { acquireCount: { r: 2, w: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 2709634 } } } protocol:op_msg 2756ms

You can see it in the loginsertThe request is executed to obtain the is lock on the collection twice, and a wait time of 2.7s occurs in one of the requestsinsertRequest execution time is consistent. Performance degradationLock waitThere was a significant correlation.

Before 2.7s, the system is refreshing the collection metadata (the duration of 2.7s is related to the chunk of the collection itself)

2020-08-19T16:40:43.853+0800 I SH_REFR [ConfigServerCatalogCacheLoader-20] Refresh for collection sdb.sColl from version 25550573|83||5f59e113f7f9b49e704c227f to version 25550574|264||5f59e113f7f9b49e704c227f took 8676ms
2020-08-19T16:40:43.853+0800 I SHARDING [conn1527] Updating collection metadata for collection sdb.sColl from collection version: 25550573|83||5f59e113f7f9b49e704c227f, shard version: 25550573|72||5f59e113f7f9b49e704c227f to collection version: 25550574|264||5f59e113f7f9b49e704c227f, shard version: 25550574|248||5f59e113f7f9b49e704c227f

Chunk version information

First, let’s understand the version information above. In the log above, we can see that both shard version and collection version are “25550573|83|5f59e113f7f9b49e704c227f”, which is a chunk version. The version information is divided into three sections by “|” and “|”

  • The first paragraph ismajor version: integer forIdentify whether the route direction changes, so that each node can update the route in time。 For example, when chunks migrate between shards, they will increase
  • The second paragraph isminor version: integer, mainly used for recordingDoes not affect some changes in route direction。 For example, when chunk splits, it increases
  • The third paragraph isepoch: objectid, which identifies the unique instance of the collection, used forIdentify whether the set has changed。 Only when the collection is dropped or the shardkey of the collection is refined, it will be regenerated

shard versionThe highest chunk version on the target shard for sharded collection

collection versionIs the highest chunk version on all shards for sharded collection

The following “route update trigger scenario” – scenario 1: request trigger “describes the typical application scenario of using shard version to trigger route update.

Routing information storage

The routing information of the sharded collection is recorded in theconfig.chunksWhile mongos & shardserver is loaded from configserver into the catalogcache on demand.

// config.chunks
{
        "_id" : "sdb.sColl-name_106.0",
        "lastmod" : Timestamp(4, 2),
        "lastmodEpoch" : ObjectId("5f3ce659e6957ccdd6a56364"),
        "ns" : "sdb.sColl",
        "min" : {
                "name" : 106
        },
        "max" : {
                "name" : 107
        },
        "shard" : "mongod8320",
        "history" : [
                {
                        "validAfter" : Timestamp(1598001590, 84),
                        "shard" : "mongod8320"
                }
        ]
}

The document recorded in the above example indicates the chunk:

  • The namespace belongs to is“ sdb.sColl Its epoch is “5f3ce659e6957ccdd6a56364″“
  • The chunk interval is {Name: 106} ~ {Name: 107}, and the chunk version is {major = 4, minor = 2}, on the shard of mongod8320
  • At the same time, some historical information is recorded

Route update trigger scenario

Routing update adopts the “lazy” mechanism, and will not be updated in non essential scenarios. There are two main scenarios for route refresh:

Scenario 1: request triggering

After receiving a client request, mongos adds a client request based on the routing information in the current catalogcache cache「shardVersion」Meta information of. Then the request is distributed to the target shard according to the routing information.

{ 
  insert: "sCollName", 
  documents: [ { _id: ObjectId('5f685824c800cd1689ca3be8'), name: xxxx } ], 
  shardVersion: [ Timestamp(5, 1), ObjectId('5f3ce659e6957ccdd6a56364') ], 
  $db: "sdb"
}

After receiving the request from mongos, shardserver extracts the「shardVersion」Field, and compare the「shardVersion」Compare. Compare the twoepoch & majorVersionWhether it is the same or not. If it is the same, it is considered that it can be written. If the versions do not match, aStaleConfigInfoAbnormal. For this exception, shardserver & mongos will handle the exception, and the logic is basically the same: if the local routing information is of lower version, the route will be refreshed.

Scenario 2: special request

  • Some command execution will trigger the change of routing information, such asmoveChunk
  • Affected by the behavior of other nodes, receiveforceRoutingTableRefreshCommand, force refresh
  • Some behaviors must obtain the latest routing information, such ascleanupOrphaned

Route refresh behavior

Mongodb splitchunk caused routing table refresh, resulting in slow response

The specific refresh behavior is divided into two steps

Step 1: pull authoritative routing information from the config node and refresh the catalogcache routing information. In fact, it was finally passedConfigServerCatalogCacheLoaderThread, construct a

{
    "ns": namespace,
  "lastmod": { $gte: sinceVersion}
}

Request to get routing information. If the epoch of the collection changes or there is no local route information for the collection, then only incremental route information is needed,sinceVersion=The largest version number in the local routing information, i.eshard versionOtherwisesinceVersion=(0,0), obtain the routing information in full.

ConfigServerCatalogCacheLoaderAfter obtaining the routing information, the routing information in the catalogcache will be refreshed. At this time, the system log will print the following information as seen above:

2020-08-19T16:40:43.853+0800 I SH_REFR [ConfigServerCatalogCacheLoader-20] Refresh for collection sdb.sColl from version 25550573|83||5f59e113f7f9b49e704c227f to version 25550574|264||5f59e113f7f9b49e704c227f took 8676ms

Step 2: updateMetadataManager(used to maintain the meta information of the set, and provide the function of getting a consistent routing information for some scenarios). to updateMetadataManagerTo ensure consistency, aX lock。 During the update process, the system log will print the second log seen above:

2020-08-19T16:40:43.853+0800 I SHARDING [conn1527] Updating collection metadata for collection sdb.sColl from collection version: 25550573|83||5f59e113f7f9b49e704c227f, shard version: 25550573|72||5f59e113f7f9b49e704c227f to collection version: 25550574|264||5f59e113f7f9b49e704c227f, shard version: 25550574|248||5f59e113f7f9b49e704c227f

The root cause of the log that affects our performance at the beginning of the article is still due to the X lock of updating meta information.

Changes of 3.6 + version on chunk version management

So, why is it that 3.4 is OK, but 4.0 is degraded? Here is the direct answer:In the latest version of 3.6 & 4.0, when shard performs splitchunk, if shardversion = = collectionversion, the major version will be increased, and the route refresh will be triggered.In version 3.4, only minor version will be added. Let’s take a look at the basic process of split chunk first, and then we’ll elaborate on why we need to make such changes

Splitchunk process

Mongodb splitchunk caused routing table refresh, resulting in slow response

  • “Auto splitting trigger”: in 4.0 and earlier, auto splitting of sharding instances was triggered by mongos. Each time there is a write request, mongos will record the write amount of the corresponding chunk and determine whether to issue it to shardserver oncesplitChunkRequest. Judgment criteria:Datawrittenbytes > = maxchunksize / 5 (fixed value)
  • 「splitVector + splitChunk」: issue asplitVectorRequest to obtain the split point for splitting the chunk. In this process, data scanning and calculation will be carried out according to the indexVector command。 ifsplitVectorIf the shard is split again, it will be sent to the next shardsplitChunkRequest for the actual split.
  • 「_configsvrCommitChunkSplit」: shardserver receivedsplitChunkAfter the request, first obtain a distributed lock, and then issue one to configserver_configsvrCommitChunkSplit。 After receiving the request, configserver updates the data and completes the splitchunk. During the process, the chunk version information will change.
  • 「route refresh」: after the above process is completed normally, mongos will refresh the route.

When split chunk, the chunk version changes

staySERVER-41480When splitting chunk, the version management of chunk version is adjusted

In 3.4 and earlier versions of 3.6 and 4.0,「_configsvrCommitChunkSplit」It will only increase the minor version of chunk.

The original reasoning for this was to prevent unnecessary routing table refreshes on the routers, which don’t ordinarily need to know about chunk splits (since they don’t change targeting information).

The root cause is to protect mongos from the necessary route refresh, because splitchunk does not change the routing target, so mongos does not need to be aware.

However, only the small version of the auto increment, if the user monotonically incremental write, easy to cause large performance overhead.

Mongodb splitchunk caused routing table refresh, resulting in slow response

Suppose that there are two mongos: mongosa and mongosb, two shards: Sharda (chunkrange: minkey ~ 0) and shardb (chunkrange: 0 ~ maxkey). The user writes monotonically and incrementally.

  • T1 timeMongosb first determines that chunk satisfies“Auto splitting trigger”Condition, send to shardb「splitVector + splitChunk」After the normal end of the request, mongosb triggers the route refresh. At this time, the chunkrange of shardb is 0 ~ 100100 ~ maxkey.
  • Then within a certain period of time (e.gT2 moment), mongosb cannot be satisfied“Auto splitting trigger”The condition is satisfied「splitVector + splitChunk」But in the end「_configsvrCommitChunkSplit」Step, because the routing table of mongosa A is not up-to-date, the 0 ~ maxkey cannot be split according to its request, and it cannot be executed successfully. Since the whole process is not completed completely, mongosa cannot update the routing tableDuring this period of time, there will be such invalid requests

As described above,splitVectorScan and calculate the data according to the index,splitChunkThe distributed lock will be acquired, which is a time-consuming request, so the impact of this scenario on the performance can not be ignored.

staySERVER-41480If shardversion = = collectionversion (i.e. the last chunk split of collection also occurred on this shard), the major version will be increased to trigger the refresh of routing of each node. The fixed version is3.6.15, 4.0.13, 4.3.1, 4.2.2

And this fix led to the problem we had at the beginning of the article, to be more specific,Any split operation on the shard with shardversion = = collectionversion will cause global route refresh.

Official restoration

SERVER-49233This problem is elaborated in detail

we chose a solution, which erred on the side of correctness, with the reasoning that on most systems auto-splits are happening rarely and are not happening at the same time across all shards.

We have chosen a solution that is not entirely right(SERVER-41480)The reason is that auto split rarely happens in most systems and does not happen on all shards at the same time.

However, this logic seems to be causing more harm than good in the case of almost uniform writes across all chunks. If it is the case that all shards are doing splits almost in unison, under this fix there will constantly be a bump in the collection version, which means constant stalls due to StaleShardVersion.

However, in the case of balanced writing of all chunks, this logic seems to do more harm than good. If all shards are split at the same time in this scenario, theSERVER-41480Under the repair, the collection version will continue to be bumpy, which means that it will continue to be affected byStaleShardVersionAnd that led to a constant pause.

Mongodb splitchunk caused routing table refresh, resulting in slow response

To illustrate the following problem in detail: suppose that a sharding instance has four shards, each holding two chunks, and major version = n at the current time. The client writes all chunks of sharding instance evenly. At a certain time, mongosa judges that all chunks meet the split condition, and successively triggers split chunks for each shard. For the sake of illustration, suppose that, as shown in the figure, at T1, T2, T3, T4, successive chunk split triggers are performed at Sharda, shardb, shardc and shardd in turn, then:

  • At t1.1, chunk1 splits, making shardversion = = collection; at t1.2, chunk2 splits, triggering configserver major version + +, and the latest major version = n + 1; at t1.3, Sharda senses and refreshes the local major version = n + 1
  • The above process occurred successively at T2, T3 and T4.
  • Finally, at T5, mongosa actively refreshes the routing table after triggering split chunk, and perceives major version = n + 4

When another mongos in the system (not updated, major version = n in the routing table) sends a request to shard (such as shardb)

  • After the first request interaction, mongosx perceives that its major version is backward, interacts with configserver, and issues a second request after updating the local routing table
  • In the second request, shardb perceives that its major version is behind, and pulls and updates the routing table through configserver
  • In the third request, both parties get the latest routing table and complete the request
  • When the aware routing table between mongos & shards lags behind the request interactionStaleShardVersionIn the process of routing table updating, all requests that need to rely on the set routing table to complete are requiredWait for the routing table update to completeBefore you can continue. Therefore, the above process is described in JIRABecause StaleShardVersion And that led to a constant pause.

meanwhileSERVER-49233Specific solutions are provided3.6.194.0.204.2.9And subsequent versions of theincrementChunkMajorVersionOnChunkSplitParameter, the default value is false (i.esplitChunkMajor version will not be added), which can be set to true in the configuration file or by starting setparameter.

The auto splitting logic was changed to trigger on shardserver in version 4.2(SERVER-34448)There will be no more scenarios where mongos sends invalid splitchunks frequently. So for version 4.4,SERVER-49433If the logic of adding major version is rolled back directly, only minor version will be increased. (in version 4.2, since the intermediate version provides the major version logic, theincrementChunkMajorVersionOnChunkSplitTo let users choose)

The behavior of each version is summarized as follows:

  • Only minor version will be added:3.4All versions3.6.15Previous versions4.0.13Previous versions4.2.2Previous versions4.4(not yet released)
  • Shardversion = = collectionversion will increase major version, otherwise increase minor version:3.6.15~ 3.6.18(inclusive)4.0.13 ~ 4.0.19(inclusive)4.2.2 ~ 4.2.8(inclusive)
  • provideincrementChunkMajorVersionOnChunkSplitParameter. By default, only minor version is added3.6.19And subsequent versions4.0.20And subsequent versions4.2.9And subsequent versions

Usage scenarios and Solutions

Mongodb version Usage scenarios Restoration plan
Below 4.2 Data writing is fixed in some shards Use the version (or settings) that can increase the major versionincrementChunkMajorVersionOnChunkSplit = true)
Below 4.2 Data writing is balanced between Shards Use the version that only increases minor version (or settingincrementChunkMajorVersionOnChunkSplit = false)
4.2 All scenarios Use the version that only increases minor version (or settingincrementChunkMajorVersionOnChunkSplit = false)

Alibaba cloud mongodbThe official fix has been followed up in version 4.2. Users who encounter this problem can upgrade the instance to the latest version of 4.2 and configure it on demandincrementChunkMajorVersionOnChunkSplitThat’s fine.