Mongodb technology from 0 to 1+


Mongodb technology from 0 to 1+
This is Chen Shi’s award-winning article in the “Qingmang dialect growth” mongodb essay competition. Let’s enjoy it together.


By chance, I found that Mongo Chinese community had organized a essay soliciting activity. Although I was still on the way to becoming a big man, it was good to take part in it. So there was this article.

The topic selection framework has been specified in the activity. I thought for a while, and I think it is suitable for topic 1 from 0 to 1 +, which is to talk about how to learn mongodb’s core technology from shallow to deep. Why 1 + instead of 1? Because I think 0 represents the starting point and origin, 1 represents a journey, and 1 + represents continuous progress in this direction. After all, there is no end to learning~

How can we talk about it? I am a professional in the database field, but I have not been in touch with mongodb, but I have been lucky to get into contact with the project of dealing with the kernel, so I have studied for a period of time and gained a little. Therefore, this article will inevitably talk about how we can start learning mongodb. Since starting from 0, we must also talk about the basic concepts and principles of Mongo. As it is a distributed database, we should also talk about the common principles of distributed technology. It’s enough to talk about this trip. Of course, a single article can’t tell you the whole picture. We may as well try to understand and understand the fundamental abstractness and principle from the perspective of high-level. In this way, no matter what database we study, I think I will benefit a lot.

Getting started

There is no doubt that the most useful documents come from official documents, which are quite rich enough to learn for some time. But if you just stare at the document for a few weeks, it’s not very useful, it’s easy to forget. This kind of document is suitable for learning while reading and doing while reading.

Official document link:

There is a website named DB rank, which ranks all kinds of databases. Mongodb is always at the top, which shows its charm. In short, Mongo is a NoSQL database system based on documents. Note that the documents here do not refer to documents of word or excel, but to JSON documents, {K1: V1, K2: V2 }It’s like this. As we all know, databases can be classified at different levels to facilitate people to understand and compare. In addition to the document type, there are also kV, columnfamily, graph, etc. Is there any document in the same category as Mongo? Of course, I’ve been in touch with couchbase and Erlang’s document type DB before. If you want to make performance comparison, I think they are quite suitable, although you like to talk about MySQL and Mongo.

Here we recommend a book “mongodb in action”, which can help us to clarify the concept systematically and play a role that official documents cannot. Although the book focuses on the older version 3.0, I think it will not be too different from the current 4.2 in use. I will change my view after I am familiar with the use and basic principles.

Here is not the use of various commands to do the analysis, we can read the document can get. Talk about some experience.


Schema free means that there is no schema or schema is loose, relative to relational schema. For database novices, is the concept of schema vague? I don’t know what Chinese is, many words feel that English expression is more fluent. I excerpted a definition [1]:

A database schema is the skeletonstructure that represents the logical view of the entire database. It defineshow the data is organized and how the relations among them are associated. Itformulates all the constraints that are to be applied on the data.

It is a skeleton, the skeleton of a database, which defines its logical view, that is to see what it looks like from the outside. It includes how the data is organized, how it is related, and what kind of constraints are there. Therefore, it is a kind of descriptive detail, which should be provided at the DB design stage to help developers establish mental view.

Why focus on schema here? Because it is true that Mongo schema looks different, which is a bit awkward for people used to relational world. In fact, it can also be said that it is schema less. It seems that there is no schema. It is very arbitrary. You can add fields or attributes as you want. This capability is obviously very suitable for the business that is constantly developing dynamically. At the beginning, you don’t know which fields your business has. You just want to add them when you need them in the future. At this time, Mongo is very suitable at this level. However, if relational schema provides such capability, it will cost a little bit more than Mongo.

2. Applicable scenarios

Speaking of this point, I’m afraid that database users often have to answer. A user comes to ask if his business can connect with mongodb. How do you answer him? He asked very simple, but you can not answer very simple, at least you have to ask a few more questions, because he did not explain clearly. What should I tell you?

What kind of business is it? How much data?

What is the reading and writing ratio? How big is the reading and writing QPS?

What are the characteristics of reading and writing, such as low peak at night?

Access pattern?

and so on

Generally speaking, the business should give this information before we can judge whether it is appropriate or not. After all, there are so many dB in the market. If every DB is suitable for all occasions, why should there be so many DB? The unification is not good.

But sometimes, for example, in the public cloud environment, the business context may still be secret, and I don’t want to tell you, or he is not very clear / hard to estimate. Can we take it now? It’s hard to say, it’s better to run workload in the test environment. A few days ago, I saw the keyviz shared by tidb a few days ago, and I have some ideas. If you have children’s shoes that are interested in this observability tool, we can discuss them together.

According to《MongoDB in action》The following scenarios can be accessedmongo:

  • web app

It’s too broad. However, it is true that mongodb is widely used in web applications. The characteristics of web applications require high scalability, flexible and rich queries, dynamic addition of fields, etc

  • agile development

The main emphasis here is that the advantage of no fixed schema makes it suitable for agile development methodology

  • Analytical and logging

Capped collection is suitable for storing log data. I don’t see many analytical ones. Is it better than classified OLAP DB?

  • caching
  • Variable schema

3. Script building

I suggest that you write a script to build a cluster by yourself, which is convenient to generate by one key, rather than knocking one command at a time. My script for reference [7]

I’ll talk about debugging the kernel with GDB on the basis of quickly creating customized clusters.

Distributed concept and principle

This area is too big!

Mongodb is a distributed database. Compared with a single database, there is a network distance between nodes, so all kinds of unreliable things will happen (Google “8 fallacies common in distributed systems”). I’ll tell you something about Mongo_ ^^_ For more interesting information, be sure to refer to DDIA [6], the most popular version so far.

I simply use my own language to talk about the background, why we need it, and how Mongo does it. I suggest that readers should google more about the vocabulary.


Consensus agreement


A simple understanding is to reach agreement in many ways. Anyone who has a little contact with this topic knows raft. This one was created by Stanford Professor John ousterhout and his doctoral student Diego ongaro. It has been applied in a variety of distributed databases, such as tidb and polardb.

Of course, there are other protocols in the industry, such as Paxos of Lamport (applied to chubby), Zab of zookeeper and PV1 of mongodb.

Why do you need it

In short, when multiple nodes make a decision together, if you say yours and I say mine, how can you decide? It’s like a group of people having a meeting in the room. They are all talking, but there is no unification. In the end, this meeting can only be held in vain. Similarly, in a distributed system, we need to have a set of rules to let each node agree on the event and the result, so that it can work normally. In fact, this is very consistent with the real world model.

MongoHow to do it

Mongo uses mongodb PV1, which is a kind of raft protocol, but it has rich extensions, such as rs.conf In (), you can configure the priority, hidden, vote and other attributes of each node, which is very flexible; prevote, dryrun and other actions are added. Readers can refer to relevant documents for details.

Isolation level / consistency / cap


These concepts are similar, so they are put together. It seems that we don’t talk about acid in distributed system, which is a common term in stand-alone relational database, and C in this is not the same thing as consistency in distributed system!

Cap was put forward by brewer in 1992. Many papers do not recommend the use of this word, because it is very ambiguous;

In many papers, there are many consistent words, such as

-There is causal consistency in Mongo

-Linearity, linear consistency, for single object, always read the latest data

-Serialization, which emphasizes that multiple transactions operate on multiple objects, is the strongest isolation level in relational dB

-Strict serializability, linearizability + serializability, mentioned in Google spanner

-Sequence consistency: sequence consistency is weaker than linearizability. For example, the default consistency of X86 CPU is it. We often see ` STD:: memory in C + + memory model_ order_ seq`

In terms of data security, it is necessary to guarantee persistence. The common skill is to do checkpoint regularly and have write ahead log, which has native support in wiredtiger engine layer.

Why do you need it

If there are copies and read-write, there must be a problem of whether the latest data can be read or not, which belongs to the problem of consistency. Some businesses require that they must read the latest written data, which is called strong consistency. However, some businesses do not require that the database can release this strong constraint, so there is final consistency event consistency, that is to say, given a certain time, the data in each replica will be the same. This implementation has a much lower complexity than strong consistency.

MongoHow to do it

As for consistency, I have to talk about my long-standing misunderstanding. It turns out that QUOROM in Mongo is not the kind of quorum we often say!

I have been deeply aware of Cassandra and its c++ product, scylladb. Their prototype is Amazon dynamo. In this paper, we talk about the quorum model: if there are n nodes, if you write most of them, i.e. w > n/2 and most of them are read > n/2, you can read the latest written data. However, Mongo has the claim of major, but its connotation is totally different.

When writing Mongo, the client can only write to the master and not to the slave, which is different from the leader less system (there is no master system, all nodes are equivalent). The slave node pulls the data from the master. The master-slave node maintains a major committed time point. When the majority of the writes have been reached, this point will move forward;

When the client specifies readconcern: major, whether the read is successful depends on whether the time point of initiating the operation is after the major committed time point. If so, the reading of major is successful;

Mongo transaction supports snapshot isolation, that is, the transaction can read the latest stable point. It may be old data, but it is consistent with other data, so as to avoid read-write conflict.

Replication and fault tolerance


In distributed systems, replication is an important and conventional means to improve availability. In a complex distributed environment, some components will collapse, get stuck, and do not respond. At this time, in order not to affect the user’s request, it is necessary to transfer the request to the normal node, and then there must be multiple copies of data. Otherwise, how to access the previously accessed data?

Fault redundancy is a classic concept. There are many kinds of faults in the distributed system, such as software, hardware and human. In a typical single master system, if the master node is not available, it will affect the user’s reading and writing. Therefore, when the previous master node is gone, a new master must replace it. When it is perfect, the user will not feel the switch.

Why do you need it

As mentioned above, ensure system availability and data security.

What does Mongo do

Mongo is a single master system. It can only write the master node. Therefore, it has an election mechanism, relying on the raft like protocol mentioned above. This is to ensure fault redundancy;

In terms of replication, if the slave node pulls oplog from the master node, oplog can be understood as the log in raft, which reflects the mutation of the master node. If the slave node applies this locally, it can achieve the same state as the master node.

For a very detailed description, see the official source code [12].


Personal contact with the core has not been long, here to attract jade.

The kernel is actually divided into server layer and storage engine layer. Due to the incomplete contact of server, only the engine layer is discussed.

Storage engine

Here is a document generated by Doxygen [11], which is worth reading.

Engine layer technology is the core technology of database system, which involves the realization of the core principle of database. First of all, we have to understand that there are many ways to organize data. I’m afraid we can’t say which way is better before the code is implemented.

Obviously, we need the plug-in feature. The database layer (i.e. dry SQL, CQL, query optimization, execution plan, etc.) can flexibly access a variety of storage engines. In this way, we can know who is better and who is worse by comparison. Therefore, the engine layer must be very independent, providing the most primitive interface for the upper layer to call, which is also the perfect embodiment of the idea of computer layering in the database field.

Mongodb engine has been wiredtiger since 3. X. it seems that the official has not considered putting rocksdb compatible code into it, so mongorocks is a third-party existence; of course, there is an in memory engine.


This is referred to as wt [8]. Wt was originally founded by Michael Cahill, a big guy. It was acquired by mongodb one year, and has been the default storage engine of Mongo ever since. We can see a lot about wt [2].

First of all, WT is a kV storage engine, which is the same as rocksdb in category, but it has a much smaller reputation. It seems that it is only used by Mongo, and the code is not easy to read;

The implementation of engine index is B tree, not B + tree. There are a lot of discussions on this point on the Internet. As for why to use B tree, as far as I know:

1. Mongo focuses on improving the performance of point query rather than range query. In this way, unlike B + tree, it has to go to the leaf node to get data every time. On average, it takes a shorter path;

2. Optimize the scenario of reading more and writing less;

3. Others.

Use of WT API

When WT is used in Mongo, the basic calls are as follows:

1. Create connection Conn

wiredtiger_open(home, NULL,”create,cache_size=**, transaction_sync=**, checkpoint_sync=**,…”,&conn)

This is called at startup to generate a wt pointing to DB_ Conn, which is a private member of wiredtigerkvengine.

2. Create session

All operations in Mongo have a session context. The session in the document actually corresponds to the WT of the engine layer_ In the session; code, in order to make efficient use of the session, there is a sessioncache for use. You don’t have to open it every time

conn->open_session(conn,NULL, “isolation=**”, &session)

3. Create table / index

When the Mongo layer executes createcollection / createindex, there are:
sesssion->create(session, “table::access”, “key_format=S,value_format=S”))

4. Create cursor on session

session->open_cursor(session, “table:mytable”, NULL,NULL,&cursor)

5. When the transaction is supported, open the transaction on the session

session->begin_transaction(session, “isolation=**, read_timestamp=**,sync=**,…”)

6. Use cursor set / get key / value

The JSON seen by the user and the bson seen by the Mongo server layer are turned into (key, value) pair at the bottom




7. Commit / rollback transaction

session->commit_transaction(session,”commit_timestamp=**, durable_timestamp=**, sync=**,…”)


For the above steps, a few clarifications are made:

·In particular, there is a style of calling WTB to configure the parameters of char = a. Although it is quite primitive;

·The time stamp parameters are complex and need to be deeply documented;

·The meaning of parameters should refer to [2].

Timestamp mechanism

From the official documents and videos [14], the introduction of logical session from 3.6 and the addition of timestamp field to the update structure of WT are gradually paving the way for supporting transactions and distributed transactions.

In order to be familiar with the transaction support of mongorocks, I have been exposed to some concepts of WT timestamps. At present, I can’t systematically discuss how the timestamps work among them. In this respect, we can refer to [2], and I will not talk about it here.


Listening to the name, you can guess that it is related to rocksdb. It is natural to think of it. Since the bottom layer is connected to kV engine, rocksdb is kV type, which can be connected completely, just like myrocks. Look at the source code [3] stars also has 300 +. Originally, the developer Igor canadi and others implemented 3.2, 3.4 Mongo rocks versions. The project has been on hold for some time. A few months ago, Igor canadi accepted wolfkdy’s Mr on mongorocks 4.0 [16], in which I participated in the relevant PR submission, such as [4].

The implementation of the 4.0 Mongo rocks driver layer mainly focuses on the transaction part. As Igor said, after 3.6. X, Mongo has a large internal transaction jump. If the 4.0 version is correctly implemented, it will take a lot of energy [5].

Mongorocks 4.0 has just come out, so it still needs more time to stabilize. For example, I found that there is a hole in oplog reading [13], which has been fixed by the author [15]. I am still looking forward to rocksdb accessing Mongo. I believe there will be brighter points than WT! In this regard, individuals should invest more time and expect more domestic developers to join!

Kernel GDB debugging

If you use GDB to learn large code step by step, you can’t do it. Single step is only suitable for debugging bugs. What do I talk about GDB debugging for? get runtimepath !

I always think that to get a large c + + project, in addition to staring at the code for half a day to understand code flow, GDB BT is a great weapon! Server, it is convenient to add a breakpoint to the server, and then send a command to the client!

Key points:Please use GDB version > = 8. X. The advantage is that BT has its own color display, which is much more comfortable than before.

Let’s talk about how to use it in general.

First, start a replica set or fragmentation cluster (depending on which one you care about), and set the following settings to the master:

cfg=rs.conf();cfg.settings.heartbeatTimeoutSecs=3600; cfg.settings.electionTimeoutMillis=3600000;rs.reconfig(cfg)

Let’s say we’re going to debug the master. In order to prevent failure in the default time of debugging, increase the timeout of heartbeat and selection, so that the master is still the master (of course, do not do this if you want to debug the code of master-slave code)

When we want to look at the request path of the insert command,

Just look at the code and search for the insert keyword. I believe it’s not difficult to find such words as cmdinsert. After a closer look, we find that it inherits a base class, and it also has a run method. Developers who have a feeling can actually guess at this time: when the server receives the insert request, it is likely that run will be called!

Therefore, we can add a breakpoint at run, or we find the word insertrecords in grep, which makes it more likely that the document was inserted here

Mongodb technology from 0 to 1+

You can continue to enter. This path starts from start_ The path from thread to run to insertrecords is very long. This section is enough for us to analyze how to go.

Similarly, find, update and delete are similar methods.

For transaction operations, you can go to the word “grep transaction”, and you will also find that functions that can be used as breakpoints encounter begin_ transaction,commit_ transaction, rollback_ Transaction is actually a familiar function name, which is suitable for adding breakpoints.


Mongodb technology in this regard, the amount of knowledge is very large, really can not be explained by an article. For myself, the connotation itself is fascinating, because it is a database, it is a distributed system, and it has many problems. Although Mongo’s official tightening of the agreement, some cloud manufacturers can’t play the high version. But I think, as long as it’s open source, as long as the code is real, it’s still a relief for engineers. From the shallow to the deep, from now on!

Author: Chen Shi

A keen and dedicated to database, distributed, storage technology, is also interested in Linux kernel, microprocessor architecture. Now Tencent is developing Mongo cloud database. In my spare time, I like climbing mountains, researching papers and studying humanities.

===I’m not an advertisement

Tencent cloud cmongo team is committed to building sophisticated mongodb cloud services. You are welcome to join us or exchange with me, ha ha

Email: [email protected]






[5] -partners/mongo-rocks/issues/145

[6] MartinKleppmann DDIA: Designing data-intensive Applications




[10] 4.0 transaction analysis:…

[11] Storage engine API: https://mongodbsource.github….

[12] Detailed explanation of source code for replication:…





Thanks to China’s leading database (mongodb) and CDN service providerShanghai Jinmu Information Technology Co., LtdStrong support for this essay solicitation!

Mongodb technology from 0 to 1+

Captain America

Mongoing Chinese community( )Established in 2014, Chengzhi is an officially recognized Chinese community in Greater China. With the continuous efforts of community volunteers, it has more than 20000 online and offline members. The Chinese community is composed of blogs, offline activities, technical Q & A, community, official document translation, etc. By 2020, the community has successfully held dozens of offline activities with more than 100 people, published more than 100 high-quality articles on mongodb application, and more than 20 relevant cooperation units.

The vision of Chinese community is: create an active mutual aid platform for the majority of mongodb Chinese lovers; promote mongodb as the preferred solution for enterprise database application; gather mongodb development, database, operation and maintenance experts to build the most authoritative technical community.

Mongoing Chinese community official account: mongoing – mongoing

Mongoing Chinese community

Mongodb technology from 0 to 1+

Shanghai Jinmu Information Technology Co., Ltd. is a leading mongodb database service provider and an official partner of mongodb manufacturers.

Jinmu information has always adhered to the field of data technology, making solid progress and moving forward, and has become an emerging technical force in the domestic mongodb field. Customers are widely distributed in finance, telecommunications, retail, aviation and other industries, helping users complete the smooth transition from traditional IT architecture to Internet architecture.

Since 2018, Jinmu information has established a good cooperative relationship with mongodb Chinese community, and is committed to jointly create a prosperous mongodb ecological environment.

Shanghai Jinmu Information Technology Co., Ltd