Mongodb guide — 3. Mongodb basic knowledge data type

Time:2019-11-21

Previous article: mongodb guide — 2. Mongodb Basics – documents, collections, databases, clients
Next article: mongodb guide — 4. Mongodb basics — using mongodb shell

At the beginning of this chapter, I introduced the basic concept of the document. Now you can start and run mongodb, and also do some operations in the shell. This section will be more in-depth. Mongodb supports multiple data types as values in documents, which will be described one by one.

2.6.1 basic data type

In concept, mongodb’s documents are similar to objects in JavaScript, so it can be considered similar to JSON. JSON (http://www.json.org) is a simple way of data representation: its specification can be described clearly with only one paragraph of text (its official website proves this), and only contains six types of data. There are many advantages: easy to understand, easy to analyze, easy to remember. However, on the other hand, because there are only null, Boolean, number, string, array and object data types, the expression ability of JSON has some limitations.
Although these types of JSON have strong expressiveness, most applications (especially when dealing with databases) need other important types. For example, JSON does not have a date type, which makes easy date processing annoying. In addition, JSON has only one number type, so it can’t distinguish floating-point numbers and integers, let alone 32-bit and 64 bit numbers. Furthermore, JSON cannot represent other generic types, such as regular expressions or functions.
Mongodb adds some other data types while retaining the JSON basic key / value pair feature In different programming languages, there are some differences in the exact representation of these types. The following describes other common types supported by mongodb and how to use them in documents.

null

Null is used to indicate a null or nonexistent field:

{"x" : null}

Boolean type

Boolean types have two values, true and false:

{"x" : true}

numerical value

The shell uses 64 bit floating-point values by default. Therefore, the following values are “normal” in the shell:

{"x" : 3.14}

Or:

{"x" : 3}

For integer values, you can use the numberint class (representing a 4-byte signed integer) or the numberlong class (representing an 8-character signed integer), for example:

{"x" : NumberInt("3")}
{"x" : NumberLong("3")}

Character string

UTF-8 strings can be represented as string type data:

{"x" : "foobar"}

date

Date is stored in milliseconds since the new era, time zone is not stored:

{"x" : new Date()} 

regular expression

When querying, regular expression is used as the qualification, and the syntax is the same as that of JavaScript

{"x" : /foobar/i}

array

A data list or dataset can be represented as an array:

{"x" : ["a", "b", "c"]} 

Embedded document

Documents can be nested with other documents. The nested documents are the values of the parent documents:

{"x" : {"foo" : "bar"}}

Object ID

The object ID is a 12 byte ID, which is the unique identification of the document. See section 2.6.5 for details.

{"x" : ObjectId()}

There are some less commonly used types that may be needed, including the following.

binary data

Binary data is a string of arbitrary bytes. It cannot be used directly in the shell. Binary data is the only way to save non-utf-8 characters to a database.

Code

Any JavaScript code can be included in queries and documents:

{"x" : function() { /* ... */ }} 

In addition, there are several types that are mostly used internally (or replaced by other types). In this book, this situation will be explained in particular.
For more information on mongodb data formats, refer to Appendix B.

2.6.2 date

In JavaScript, the date class can be used as the date type of mongodb. When creating a date object, use new date (…) , not date (…). If the constructor is called as a function (i.e. not including new), the string representation of the date is returned instead of the date object. This result is independent of mongodb and is determined by the working mechanism of JavaScript. If you don’t pay attention to this point and don’t always use the date constructor, you will get a mess of date objects and date strings. Because there is no match between date and string, almost all operations such as delete, update and query will cause many problems.
For a complete explanation of the JavaScript date class and the parameter format of the constructor, see section 15.9 of the ECMAScript specification (http://www.ecmascript.org).
The shell displays the date object according to the local time zone settings. However, the date stored in the database is only the number of milliseconds since the new era, and the corresponding time zone is not stored. (of course, time zone information can be stored as the value of another key).

2.6.3 array

An array is a set of values that can be manipulated either as an ordered object (such as a list, stack, or queue) or as an unordered object (such as a dataset).
In the following document, the value of “things” is an array:

{"things" : ["pie", 3.14]}

This example shows that an array can contain elements of different data types (here, a string and a floating-point number). In fact, all the values supported by the normal key / value pair can be used as the values of the array, and even the array can be nested in the array.
There is a wonderful feature of arrays in documents, that is, mongodb can “understand” its structure and know how to “drill down” into the array to operate on its contents. In this way, you can use the contents of the array to query and index the array. For example, in the previous example, mongodb can query all documents containing the element 3.14 in the “things” array. If you use this query frequently, you can index “things” to improve performance.
Mongodb can use atom update to modify the contents of the array, such as going deep into the array and changing the pie to PI. More examples of this operation will be shown later in this book.

2.6.4 embedded documents

A document can be used as the value of a key. Such a document is an embedded document. With embedded documents, data can be organized in a more natural way without having to be stored as a flat key / value pair.
For example, to represent a person with a document and save his address at the same time, you can save the address information in the embedded “address” document:

{
    "name" : "John Doe",
    "address" : {
        "street" : "123 Park Street",
        "city" : "Anytown",
        "state" : "NY"
    }
}  

In the above example, the value of the “address” key is an embedded document, which has its own “street”, “city” and “state” keys and corresponding values.
Like arrays, mongodb can “understand” the structure of embedded documents and “drill down” to build indexes, perform queries, or update them.
We’ll talk more about schema design later, but from this simple example, we can see that embedded documents can change the way data is processed. In a relational database, the document in this example is usually split into two rows in two tables (one for people and one for address). In mongodb, you can embed the address document directly into the personnel document. When used properly, embedded documents make the presentation of information more natural (and often more efficient).
The disadvantage of mongodb is that it will cause more data duplication. Assuming that “address” is a separate table in a relational database, we need to correct spelling errors in the address. When we connect “people” and “address”, the information of everyone using this address will be updated. In mongodb, however, you need to fix spelling errors for each person’s documents separately.

2.6.5 “ID and objectid

Documents stored in mongodb must have a “_id” key. The value of this key can be of any type. By default, it is an objectid object. In a collection, each document has a unique “u id” to ensure that each document in the collection can be uniquely identified. If there are two sets, both sets can have a value of 123, but only one document in each set has a value of 123.

1. ObjectId

Objectid is the default type for ‘. It is designed to be light-weight. Different machines can easily generate it in the same way that is globally unique. This is the main reason why mongodb adopts objectid instead of other conventional methods (such as the automatic increase of primary key), because it is both laborious and time-consuming to automatically increase the value of primary key on multiple servers synchronously. Because mongodb is designed to be a distributed database, it is very important to generate unique identifiers in a sharded environment.
Objectid uses 12 bytes of storage space and is a string of 24 hex digits (each byte can store two hex digits). Because it looks so long, many people find it hard to deal with it. But the key is to know that this long objectid is twice as long as the actual data stored.
If you create multiple objectids in quick succession, you will find that only the last few digits change at a time. In addition, the middle digits change (if you pause for a few seconds during creation). This is caused by the way objectid is created. The 12 bytes of objectid are generated as follows:

The first four bytes of objectid are timestamps starting from the standard era, in seconds. This brings some useful attributes.

  • Timestamps, combined with the next five bytes (described later), provide second level uniqueness.
  • Because the time stamp is in front, this means that the objectids are roughly in the order in which they are inserted. This is very useful for some aspects, such as it can be used as an index to improve efficiency, but this is not guaranteed, just “rough”.
  • These 4 bytes also imply the time of document creation. Most drivers provide a way to get this information from objectid.

Because the current time is used, many users are worried about clock synchronization to the server. Although it is a good idea to synchronize time between servers in some cases (see section 23.6.1), it is not necessary here, because the actual value of timestamp is not important, as long as it keeps increasing (once per second).
The next three bytes are the unique identifier of the host. This is usually the hash of the machine hostname. This ensures that different hosts generate different objectids without conflicts.
To ensure that the objectid generated by multiple concurrent processes on the same machine is unique, the next two bytes come from the process identifier (PID) of the process that generated the objectid.
The first nine bytes ensure that objectids generated by different machines and processes in the same second are unique. The last three bytes are an auto incrementing counter to ensure that the objectid generated by the same process in the same second is also different. Each process is allowed to have at most one second
16 777 216 different objectids.

Mongodb guide -- 3. Mongodb basic knowledge data type

2. Automatically generate “Id”

As mentioned earlier, if you insert a document without the “u id” key, the system will automatically create one for you. This can be done by the mongodb server, but is usually done by the driver on the client side. This is a good example of mongodb’s philosophy: don’t give the server what you can give the client driver to do. The reason behind this idea is that even for databases with good scalability like mongodb, it is much easier to extend the application layer than the database layer. The work is handed over to the client, which reduces the burden of database expansion.

Previous article: mongodb guide — 2. Mongodb Basics – documents, collections, databases, clients
Next article: mongodb guide — 4. Mongodb basics — using mongodb shell