Mongodb splitchunk caused routing table refresh, resulting in slow response
Mongodb sharding instance fromVersion 3.4Upgrade toVersion 4.0In the future, the performance of the insertion was significantly reduced, and a large number ofinsert
Request slow log:
2020-08-19T16:40:46.563+0800 I COMMAND [conn1528] command sdb.sColl command: insert { insert: "sColl", xxx} ... locks: {Global: { acquireCount: { r: 6, w: 2 } }, Database: { acquireCount: { r: 2, w: 2 } }, Collection: { acquireCount: { r: 2, w: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 2709634 } } } protocol:op_msg 2756ms
You can see it in the loginsert
The request is executed to obtain the is lock on the collection twice, and a wait time of 2.7s occurs in one of the requestsinsert
Request execution time is consistent. Performance degradationLock waitThere was a significant correlation.
Before 2.7s, the system is refreshing the collection metadata (the duration of 2.7s is related to the chunk of the collection itself)
2020-08-19T16:40:43.853+0800 I SH_REFR [ConfigServerCatalogCacheLoader-20] Refresh for collection sdb.sColl from version 25550573|83||5f59e113f7f9b49e704c227f to version 25550574|264||5f59e113f7f9b49e704c227f took 8676ms
2020-08-19T16:40:43.853+0800 I SHARDING [conn1527] Updating collection metadata for collection sdb.sColl from collection version: 25550573|83||5f59e113f7f9b49e704c227f, shard version: 25550573|72||5f59e113f7f9b49e704c227f to collection version: 25550574|264||5f59e113f7f9b49e704c227f, shard version: 25550574|248||5f59e113f7f9b49e704c227f
Chunk version information
First, let’s understand the version information above. In the log above, we can see that both shard version and collection version are “25550573|83|5f59e113f7f9b49e704c227f”, which is a chunk version. The version information is divided into three sections by “|” and “|”
- The first paragraph is
major version
: integer forIdentify whether the route direction changes, so that each node can update the route in time。 For example, when chunks migrate between shards, they will increase - The second paragraph is
minor version
: integer, mainly used for recordingDoes not affect some changes in route direction。 For example, when chunk splits, it increases - The third paragraph is
epoch
: objectid, which identifies the unique instance of the collection, used forIdentify whether the set has changed。 Only when the collection is dropped or the shardkey of the collection is refined, it will be regenerated
shard versionThe highest chunk version on the target shard for sharded collection
collection versionIs the highest chunk version on all shards for sharded collection
The following “route update trigger scenario” – scenario 1: request trigger “describes the typical application scenario of using shard version to trigger route update.
Routing information storage
The routing information of the sharded collection is recorded in theconfig.chunksWhile mongos & shardserver is loaded from configserver into the catalogcache on demand.
// config.chunks
{
"_id" : "sdb.sColl-name_106.0",
"lastmod" : Timestamp(4, 2),
"lastmodEpoch" : ObjectId("5f3ce659e6957ccdd6a56364"),
"ns" : "sdb.sColl",
"min" : {
"name" : 106
},
"max" : {
"name" : 107
},
"shard" : "mongod8320",
"history" : [
{
"validAfter" : Timestamp(1598001590, 84),
"shard" : "mongod8320"
}
]
}
The document recorded in the above example indicates the chunk:
- The namespace belongs to is“ sdb.sColl Its epoch is “5f3ce659e6957ccdd6a56364″“
- The chunk interval is {Name: 106} ~ {Name: 107}, and the chunk version is {major = 4, minor = 2}, on the shard of mongod8320
- At the same time, some historical information is recorded
Route update trigger scenario
Routing update adopts the “lazy” mechanism, and will not be updated in non essential scenarios. There are two main scenarios for route refresh:
Scenario 1: request triggering
After receiving a client request, mongos adds a client request based on the routing information in the current catalogcache cache「shardVersion」Meta information of. Then the request is distributed to the target shard according to the routing information.
{
insert: "sCollName",
documents: [ { _id: ObjectId('5f685824c800cd1689ca3be8'), name: xxxx } ],
shardVersion: [ Timestamp(5, 1), ObjectId('5f3ce659e6957ccdd6a56364') ],
$db: "sdb"
}
After receiving the request from mongos, shardserver extracts the「shardVersion」Field, and compare the「shardVersion」Compare. Compare the twoepoch & majorVersion
Whether it is the same or not. If it is the same, it is considered that it can be written. If the versions do not match, aStaleConfigInfo
Abnormal. For this exception, shardserver & mongos will handle the exception, and the logic is basically the same: if the local routing information is of lower version, the route will be refreshed.
Scenario 2: special request
- Some command execution will trigger the change of routing information, such as
moveChunk
- Affected by the behavior of other nodes, receive
forceRoutingTableRefresh
Command, force refresh - Some behaviors must obtain the latest routing information, such as
cleanupOrphaned
Route refresh behavior
The specific refresh behavior is divided into two steps
Step 1: pull authoritative routing information from the config node and refresh the catalogcache routing information. In fact, it was finally passedConfigServerCatalogCacheLoader
Thread, construct a
{
"ns": namespace,
"lastmod": { $gte: sinceVersion}
}
Request to get routing information. If the epoch of the collection changes or there is no local route information for the collection, then only incremental route information is needed,sinceVersion
=The largest version number in the local routing information, i.eshard versionOtherwisesinceVersion
=(0,0), obtain the routing information in full.
ConfigServerCatalogCacheLoader
After obtaining the routing information, the routing information in the catalogcache will be refreshed. At this time, the system log will print the following information as seen above:
2020-08-19T16:40:43.853+0800 I SH_REFR [ConfigServerCatalogCacheLoader-20] Refresh for collection sdb.sColl from version 25550573|83||5f59e113f7f9b49e704c227f to version 25550574|264||5f59e113f7f9b49e704c227f took 8676ms
Step 2: updateMetadataManager
(used to maintain the meta information of the set, and provide the function of getting a consistent routing information for some scenarios). to updateMetadataManager
To ensure consistency, aX lock。 During the update process, the system log will print the second log seen above:
2020-08-19T16:40:43.853+0800 I SHARDING [conn1527] Updating collection metadata for collection sdb.sColl from collection version: 25550573|83||5f59e113f7f9b49e704c227f, shard version: 25550573|72||5f59e113f7f9b49e704c227f to collection version: 25550574|264||5f59e113f7f9b49e704c227f, shard version: 25550574|248||5f59e113f7f9b49e704c227f
The root cause of the log that affects our performance at the beginning of the article is still due to the X lock of updating meta information.
Changes of 3.6 + version on chunk version management
So, why is it that 3.4 is OK, but 4.0 is degraded? Here is the direct answer:In the latest version of 3.6 & 4.0, when shard performs splitchunk, if shardversion = = collectionversion, the major version will be increased, and the route refresh will be triggered.In version 3.4, only minor version will be added. Let’s take a look at the basic process of split chunk first, and then we’ll elaborate on why we need to make such changes
Splitchunk process
- “Auto splitting trigger”: in 4.0 and earlier, auto splitting of sharding instances was triggered by mongos. Each time there is a write request, mongos will record the write amount of the corresponding chunk and determine whether to issue it to shardserver once
splitChunk
Request. Judgment criteria:Datawrittenbytes > = maxchunksize / 5 (fixed value)
。 - 「splitVector + splitChunk」: issue a
splitVector
Request to obtain the split point for splitting the chunk. In this process, data scanning and calculation will be carried out according to the indexVector command。 ifsplitVector
If the shard is split again, it will be sent to the next shardsplitChunk
Request for the actual split. - 「_configsvrCommitChunkSplit」: shardserver received
splitChunk
After the request, first obtain a distributed lock, and then issue one to configserver_configsvrCommitChunkSplit
。 After receiving the request, configserver updates the data and completes the splitchunk. During the process, the chunk version information will change. - 「route refresh」: after the above process is completed normally, mongos will refresh the route.
When split chunk, the chunk version changes
staySERVER-41480When splitting chunk, the version management of chunk version is adjusted
In 3.4 and earlier versions of 3.6 and 4.0,「_configsvrCommitChunkSplit」It will only increase the minor version of chunk.
The original reasoning for this was to prevent unnecessary routing table refreshes on the routers, which don’t ordinarily need to know about chunk splits (since they don’t change targeting information).
The root cause is to protect mongos from the necessary route refresh, because splitchunk does not change the routing target, so mongos does not need to be aware.
However, only the small version of the auto increment, if the user monotonically incremental write, easy to cause large performance overhead.
Suppose that there are two mongos: mongosa and mongosb, two shards: Sharda (chunkrange: minkey ~ 0) and shardb (chunkrange: 0 ~ maxkey). The user writes monotonically and incrementally.
- T1 timeMongosb first determines that chunk satisfies“Auto splitting trigger”Condition, send to shardb「splitVector + splitChunk」After the normal end of the request, mongosb triggers the route refresh. At this time, the chunkrange of shardb is 0 ~ 100100 ~ maxkey.
- Then within a certain period of time (e.gT2 moment), mongosb cannot be satisfied“Auto splitting trigger”The condition is satisfied「splitVector + splitChunk」But in the end「_configsvrCommitChunkSplit」Step, because the routing table of mongosa A is not up-to-date, the 0 ~ maxkey cannot be split according to its request, and it cannot be executed successfully. Since the whole process is not completed completely, mongosa cannot update the routing tableDuring this period of time, there will be such invalid requests。
As described above,splitVector
Scan and calculate the data according to the index,splitChunk
The distributed lock will be acquired, which is a time-consuming request, so the impact of this scenario on the performance can not be ignored.
staySERVER-41480If shardversion = = collectionversion (i.e. the last chunk split of collection also occurred on this shard), the major version will be increased to trigger the refresh of routing of each node. The fixed version is3.6.15
, 4.0.13
, 4.3.1
, 4.2.2
。
And this fix led to the problem we had at the beginning of the article, to be more specific,Any split operation on the shard with shardversion = = collectionversion will cause global route refresh.
Official restoration
SERVER-49233This problem is elaborated in detail
we chose a solution, which erred on the side of correctness, with the reasoning that on most systems auto-splits are happening rarely and are not happening at the same time across all shards.
We have chosen a solution that is not entirely right(SERVER-41480)The reason is that auto split rarely happens in most systems and does not happen on all shards at the same time.
However, this logic seems to be causing more harm than good in the case of almost uniform writes across all chunks. If it is the case that all shards are doing splits almost in unison, under this fix there will constantly be a bump in the collection version, which means constant stalls due to StaleShardVersion.
However, in the case of balanced writing of all chunks, this logic seems to do more harm than good. If all shards are split at the same time in this scenario, theSERVER-41480Under the repair, the collection version will continue to be bumpy, which means that it will continue to be affected byStaleShardVersion
And that led to a constant pause.
To illustrate the following problem in detail: suppose that a sharding instance has four shards, each holding two chunks, and major version = n at the current time. The client writes all chunks of sharding instance evenly. At a certain time, mongosa judges that all chunks meet the split condition, and successively triggers split chunks for each shard. For the sake of illustration, suppose that, as shown in the figure, at T1, T2, T3, T4, successive chunk split triggers are performed at Sharda, shardb, shardc and shardd in turn, then:
- At t1.1, chunk1 splits, making shardversion = = collection; at t1.2, chunk2 splits, triggering configserver major version + +, and the latest major version = n + 1; at t1.3, Sharda senses and refreshes the local major version = n + 1
- The above process occurred successively at T2, T3 and T4.
- Finally, at T5, mongosa actively refreshes the routing table after triggering split chunk, and perceives major version = n + 4
When another mongos in the system (not updated, major version = n in the routing table) sends a request to shard (such as shardb)
- After the first request interaction, mongosx perceives that its major version is backward, interacts with configserver, and issues a second request after updating the local routing table
- In the second request, shardb perceives that its major version is behind, and pulls and updates the routing table through configserver
- In the third request, both parties get the latest routing table and complete the request
- When the aware routing table between mongos & shards lags behind the request interaction
StaleShardVersion
In the process of routing table updating, all requests that need to rely on the set routing table to complete are requiredWait for the routing table update to completeBefore you can continue. Therefore, the above process is described in JIRABecauseStaleShardVersion
And that led to a constant pause.
meanwhileSERVER-49233Specific solutions are provided3.6.19
、4.0.20
、4.2.9
And subsequent versions of theincrementChunkMajorVersionOnChunkSplit
Parameter, the default value is false (i.esplitChunk
Major version will not be added), which can be set to true in the configuration file or by starting setparameter.
The auto splitting logic was changed to trigger on shardserver in version 4.2(SERVER-34448)There will be no more scenarios where mongos sends invalid splitchunks frequently. So for version 4.4,SERVER-49433If the logic of adding major version is rolled back directly, only minor version will be increased. (in version 4.2, since the intermediate version provides the major version logic, theincrementChunkMajorVersionOnChunkSplit
To let users choose)
The behavior of each version is summarized as follows:
- Only minor version will be added:
3.4
All versions3.6.15
Previous versions4.0.13
Previous versions4.2.2
Previous versions4.4
(not yet released) - Shardversion = = collectionversion will increase major version, otherwise increase minor version:
3.6.15
~3.6.18
(inclusive)4.0.13
~4.0.19
(inclusive)4.2.2
~4.2.8
(inclusive) - provide
incrementChunkMajorVersionOnChunkSplit
Parameter. By default, only minor version is added3.6.19
And subsequent versions4.0.20
And subsequent versions4.2.9
And subsequent versions
Usage scenarios and Solutions
Mongodb version | Usage scenarios | Restoration plan |
---|---|---|
Below 4.2 | Data writing is fixed in some shards | Use the version (or settings) that can increase the major versionincrementChunkMajorVersionOnChunkSplit = true) |
Below 4.2 | Data writing is balanced between Shards | Use the version that only increases minor version (or settingincrementChunkMajorVersionOnChunkSplit = false) |
4.2 | All scenarios | Use the version that only increases minor version (or settingincrementChunkMajorVersionOnChunkSplit = false) |
Alibaba cloud mongodbThe official fix has been followed up in version 4.2. Users who encounter this problem can upgrade the instance to the latest version of 4.2 and configure it on demandincrementChunkMajorVersionOnChunkSplit
That’s fine.