Fragment, unique index and upsert



Sharding, unique index andupsertOn the surface, it seems that there is no direct connection between several things. What kind of connection does it have?


In order to maintainEffectiveness of horizontal expansionThe sharding function must ensure that there is no direct relationship between the shards, and that decisions can be made independently without interaction with other Shards. If this can not be met, with the increasing number of shards, more and more shards need to interact, which is bound to be slower and slower, then it violates the original intention of sharding. such asJOINIt is a typical function of breaking the independence of fragmentation. In onenIn order to get Cartesian product, each partition must be associated with othern-1Pieces of interaction to get results. Although not necessarily linear delay growth (becausen-1One request can be parallel), but it is conceivable that it will consume a lot of resources, and the impact will become more and more significant with the growth of the number of slices. Finally, it will reach the point that “adding a slice may not help the performance at all”, or “adding a slice will reduce the performance”.

unique index

The unique index is another feature that significantly destroys fragment independence. Front facingJOINThe analysis of is fully applicable to the unique index, and even worse, the unique index has a further adverse impact, that is, when writing data, it must occupy a global lock across partitions, otherwise it cannot guarantee its uniqueness, and it can be imagined what impact on performance. That’s why mongodb doesn’t plan to implement globally unique indexes.

There is a special situation that can change this disadvantage, that is, the only index key is just the chip key. Once the partition key is determined, which partition the document should go to will be determined. So long as the key is unique on this slice, it is no longer necessary to negotiate with other partitions.


Semantically, we useupsertGenerally, you want a key to appear only once (or every timeinsertJust fine. This is exactly what the unique index does, and the unique index has the above problems, so the only meaningful situation isupsertThe condition used is exactly the chip key, and the chip key is unique.
Do you feel at ease if you meet the above conditions? Not at all. When deciding whether a key exists or not, executeupdate/insertThere is a gap between them. That is, detection and execution are not in an atomic operation, nor can they be in an atomic operation, otherwise it will be a large granularity lock. Moreover, mongodb does not really control the document level through locking, but through “optimistic concurrency control”.
Therefore, for the sake of efficiency, it is not the atomic operation that is the right choice, and it is not particularly troublesome to solve this problem. In fact, it is only necessary to retry the operation when the duplicate key exception is encountered, because in theory, the operation should becomeupdateAnd no longerinsert, naturally avoiding problems. Or, in 4.2, automatic retry for such errors is directly implemented (server-37124).

Reference material

  • Unique Indexes:…
  • Retry full upsert path when duplicate key exception matches exact query predicate:…

Author brief introduction

Zhang Yaoxing, chief technical consultant of mongodb Asia Pacific region. He has many years of practical experience in the development, application and consulting services of mongodb. As a mongodb certification expert, he has provided training, performance optimization, architecture design and other related technical services for various large customers in different industries.