MongoDB study notes-1, NoSQL foundation


1.1, NoSQL concept

NoSQL is not literally “not SQL” or “non-SQL”, but NoSQL=Not Only SQL, that is, “not just SQL”, which is a general term for database management systems that are different from traditional relational databases.

NoSQL: non-relational, distributed, database design pattern that does not provide ACID.

NoSAQL is used for the storage of ultra-large-scale data. These types of data storage do not require a fixed mode and can be scaled out without redundant operations.

[extension]Introduction to ACID

Relational databases follow the ACID rules:

(1) A (Atomicity) atomicity

Atomicity is easy to understand, which means that all operations in the transaction are either completed or not performed. The condition for the success of the transaction is that all operations in the transaction are successful. As long as one operation fails, the entire transaction fails and needs to be returned. roll. For example, bank transfer, transferring 100 yuan from account A to account B, is divided into two steps: 1) withdraw 100 yuan from account A; 2) deposit 100 yuan to account B. These two steps must be completed together, or not completed together. If only the first step is completed and the second step fails, the money will be inexplicably reduced by 100 yuan.

(2) C (Consistency) consistency

Consistency is also relatively easy to understand, that is to say, the database must always be in a consistent state, and the operation of the transaction will not change the original consistency constraints of the database.

(3) I (Isolation) independence

The so-called independence means that concurrent transactions will not affect each other. If the data to be accessed by one transaction is being modified by another transaction, as long as the other transaction is not committed, the data it accesses will not be affected by the uncommitted transaction. . For example, there is an existing transaction that transfers 100 yuan from account A to account B. If the transaction is not completed, if B checks his account at this time, he will not see the newly added 100 yuan

(4) D (Durability) persistence

Persistence means that once a transaction is committed, the modifications it makes will be permanently saved on the database and will not be lost even if there is a downtime.

[extension]Introduction to CAP

  • C: Consistency (strong consistency)
  • A: Availability
  • P: Partition tolerance (partition fault tolerance)

    The CAP theory means that in a distributed storage system, only the above two points can be realized at most. Since the current network hardware will definitely have problems such as delay and packet loss, we must implement partition fault tolerance in a distributed system. So we can only make a trade-off between consistency and availability, and no NoSQL system can guarantee these three points at the same time.

CA: Traditional Oracle database. (Single-point clusters, systems that satisfy consistency and availability, are usually not very powerful in scalability.)

AP: Most site architecture choices are in . (A system that satisfies availability and partition fault tolerance may generally have lower requirements for consistency.)

CP: Redis, Mongodb. (A system that satisfies consistency and partition tolerance usually has low performance.)

1.2. Why use NoSQL?

Today we can easily access and capture data through third-party platforms (such as: Google, Facebook, etc.). Users’ personal information, social networks, geographic location, user-generated data and user operation logs have multiplied. If we want to mine these user data, SQL databases are no longer suitable for these applications, but the development of NoSQL databases can handle these large data very well.

Due to the paradigm constraints of relational databases, the characteristics of things, and the characteristics of disk IO, if the server uses a relational database, when a large amount of data is generated, the traditional relational database can no longer meet the needs of fast query and data insertion. The emergence of NoSQL solves the problem. this crisis. It reduces data security, reduces support for transactions, reduces support for complex queries, and improves performance. However, NoSQL is still not the best choice in some specific scenarios, such as some absolutely must have transaction and security indicator scene.

  • With a flexible data model, it can handle unstructured/semi-structured big data;
  • Easy scalability (scaling up vs. scaling out);
  • High read and write performance (non-relational data, simple database structure).

Now general Internet companies use a combination of relational databases and non-relational databases. Relational databases are used for data storage and persistence, and non-relational databases are used for memory and cache. Some scenarios do not require transactions but frequent read operations also use non-relational databases.

【Notice】Now NoSQL does not have a unified standard, that is to say, every time you learn a non-relational database, the queries in them are different, so the learning cost is very high, so NoSQL has a great disadvantage.

1.3. Four families of NoSQL databases

1.3.1. Key-value storage

  • Features: Key-value databases are like hash tables used in traditional languages. Add, query or delete data by key.
  • Advantages: query speed is fast.
  • Disadvantages: The data is unstructured and is usually only stored as string or binary data.
  • Application scenarios: Content caching, user information, such as sessions, configuration information, shopping carts, etc., are mainly used to process high access loads of large amounts of data.
  • NoSQL representatives: Redis (temporary/permanent key-value storage), Memcached (temporary key-value storage), DynamoDB, etc.

1.3.2. Document (Document-Oriented)

  • Features: This type of data model is a versioned document. Semi-structured documents are stored in a specific format, such as json, which is a collection of a series of data items. Each data item has a name and a corresponding value. The value is both It can be simple data types, such as strings, numbers, and dates, etc.; it can also be complex types, such as ordered lists and associated objects.
  • Advantages: The data structure requirements are not strict, the table structure is variable, and there is no need to pre-define the table structure like a relational database.
  • Disadvantages: low query performance, lack of unified query syntax.
  • Application scenarios: logs, web applications, etc.
  • NoSQL stands for: MongoDB, CouchDB, etc.

1.3.3, column family storage

  • Features: Coping with massive data in distributed storage. Column store databases store data in lists, aggregate multiple columns into a column family, and the keys still exist, but they are characterized by pointing to multiple columns. For example, if we had a Person class, we would normally query their name and age together instead of salary, in which case name and age would be put into one column family and salary in another in the clan.
  • Advantages: Column storage has fast query speed, strong scalability, and easier distributed expansion. It is suitable for distributed file systems and handles massive data in distributed storage.
  • Disadvantages: low query performance, lack of unified query syntax.
  • Application scenarios: logs, distributed file systems (object storage), recommended portraits, spatio-temporal data, messages/orders, etc.
  • NoSQL stands for: Cassandra, HBase, etc.

1.3.4, graphics storage

  • Features: The graph database allows us to store data in the form of a graph. It uses a flexible graph model and can be extended to multiple servers.
  • Advantages: Graph-related algorithms, such as shortest path addressing, N-degree relationship search, etc.
  • Disadvantages: In many cases, it is necessary to calculate the entire graph to obtain the required information. Distributed cluster solutions are not easy to implement, the processing of super nodes is weak, there is no fragmentation storage mechanism, and the domestic community is not active.
  • Application scenarios: social network, recommendation system, etc. Focus on building the relationship graph.
  • NoSQL representatives: Neo4j, Infinite Graph, etc.

1.4. Advantages and disadvantages of NoSQL

(1) Advantages:

  • High scalability: NoSQL databases (such as Cassandra) can easily add new nodes to expand the cluster. However, relational databases have a multi-table query mechanism such as join, which makes it difficult to expand the database;
  • High availability (fast reading and writing): high query efficiency, relational databases are limited by disk IO, and the pressure doubles under high concurrency, while in-memory databases like Redis support 100,000 reads and writes per second.
  • Flexible data model: Traditional relational databases are structured tables, while NoSQL can be key-value, documents, column families, and graphs.
  • Low cost: Open source software is much lower than the licensing fees for enterprises like Oracle.

(2) Disadvantages:

  • There is no fixed query standard, and the learning cost is high;
  • Most do not support transactions (Redis does, MongoDB does not);
  • Most of them are start-up products, not mature enough.

1.5. Ranking of database management systems

According to DB-Engines:…The ranking, this article intercepts the ranking Top 45 as shown in the figure below. The DB-Engines ranking ranks database management systems according to their popularity. The ranking is updated monthly.
MongoDB study notes-1, NoSQL foundation
MongoDB study notes-1, NoSQL foundation

Recommended Today

Redis Lua Tutorial

lua–redis Introduction Using Lua scripts in Redis is a relatively common thing in business development.Advantages of using LuaThere are the following points. For sending multiple redis commands, use Lua scripts toReduce network overhead. It is especially critical when the network transmission is slow or the response requirements are high.Lua scripts can process multiple requests at […]