Advantages, disadvantages and application scenarios of NoSQL such as mongodb, HBase and redis

Time:2021-10-22

Four categories of NoSQL

The position of NoSQL database in the whole database field is self-evident. In the era of big data, although RDBMS is excellent, RDBMS is gradually unable to cope with many database processing tasks in the face of the rapidly growing data scale and increasingly complex data model. At this time, NoSQL has successfully gained a foothold in the database field with easy expansion, large amount of data, high performance and flexible data model.

At present, we basically agree that NoSQL databases are divided into four categories: key value storage database, document database, column storage database and graphic database. Each type of database can solve the problems that relational data can’t solve. In practical application, the classification boundary of NoSQL database is not so obvious, and it is often a combination of multiple types.

Detailed explanation of mainstream NoSQL: mongodb, HBase and redis

MongoDB

Mongodb is a high-performance, open source, schema free document database. The development language is C + +. It can be used in many scenarios to replace the relational database or key / value storage of the system.

1. Mongodb features
  • Language used: C++
  • Features: it retains some friendly features of SQL (query, index).
  • License: agpl (initiator: Apache)
  • Protocol: custom, binary (bson)
  • Master / slave replication (supports automatic error recovery and uses sets replication)
  • Built in fragmentation mechanism
  • Support JavaScript Expression Query
  • Arbitrary JavaScript functions can be executed on the server side
  • Update in place support is better than CouchDB
  • Memory to file mapping is used in data storage
  • The focus on performance exceeds the functional requirements
  • It is recommended to turn on the log function (parameter — Journal)
  • On 32-bit operating systems, the database size is limited to about 2.5Gb
  • The empty database occupies about 192mb
  • Use gridfs to store big data or metadata (not a real file system)
2. Advantages of mongodb:

1) With higher write load, mongodb has higher insertion speed.

2) Handle large-scale single tables. When the data table is too large, it can be easily divided.

3) High availability. It is not only convenient but also fast to set up m-s. mongodb can also realize node (Data Center) failover quickly, safely and automatically.

4) For fast query, mongodb supports two-dimensional spatial indexes, such as pipes, so it can quickly and accurately obtain data from specified locations. After mongodb is started, the data in the database will be loaded into memory in the form of file mapping. If memory resources are abundant, this will greatly improve the query speed of the database.

5) With the explosive growth of unstructured data, adding columns may lock the entire database or increase the load in some cases, resulting in performance degradation. Due to the weak data structure mode of mongodb, adding a new field will not have any impact on the old table, and the whole process will be very fast.

3. Mongodb disadvantages:

1) Transaction is not supported.

2) Mongodb takes up too much space.

3) Mongodb does not have mature maintenance tools.

4. Mongodb application scenario

1.) it is suitable for real-time insertion, update and query, and has the replication and high scalability required for real-time data storage of application programs;

2) It is very suitable for storage and query in document format;

3.) high scalability scenario: mongodb is very suitable for databases composed of dozens or hundreds of servers.

4.) the focus on performance exceeds the functional requirements.

HBase

HBase is a subproject of Apache Hadoop. It is an open source version of BigTable. The language implemented is Java (so it depends on the Java SDK). HBase relies on Hadoop’s HDFS (distributed file system) as the most basic storage base unit.

1. HBase features:

  • Language: Java
  • Features: support billions of rows x millions of columns
  • License: Apache
  • Protocol: http / rest (support thrift, see note 4)
  • Modeling after BigTable
  • Map / reduce with distributed architecture
  • Optimize real-time queries
  • High performance thrift gateway
  • The query operation is pre judged by scanning and filtering on the server side
  • Supports XML, protobuf, and binary http
  • Cascading, hive, and pig source and sink modules
  • Jruby (jirb) – based shell
  • Configuration changes and minor upgrades are rolled back again
  • No single point of failure
  • Random access performance comparable to MySQL

3. Advantages of HBase

1) Large storage capacity, one table can hold hundreds of millions of rows and millions of columns;

2.) it can be retrieved by version, and the required historical version data can be found;

3) when the load is high, the horizontal segmentation and expansion can be realized by simply adding machines. The seamless integration with Hadoop ensures its data reliability (HDFS) and high performance of massive data analysis (MapReduce);

4.) on the basis of point 3, the occurrence of single point fault can be effectively avoided.

4. HBase disadvantages

1. The implementation based on Java language and Hadoop architecture means that its API is more suitable for Java projects;

2. In the node development environment, there are many dependencies, troublesome configuration (or do not know how to configure, such as persistent configuration), and lack of documentation;

3. It takes up a lot of memory, and since it is based on HDFS optimized for batch analysis, the reading performance is not high;

4. API is relatively clumsy compared with other NoSQL.

5. Applicable scenarios of HBase

1) BigTable type data storage;

2) Have version query requirements for data;

3) Meet the requirements of large data volume and simple expansion.

Redis

Redis is an open source log and key value database written in ANSI C language, supporting network, memory based and persistent, and provides APIs in multiple languages. VMware currently hosts the development.

1. Redis features:

  • Language used: C / C++
  • Features: extremely fast operation
  • License: BSD
  • Protocol: telnet like
  • Memory database supported by hard disk storage,
  • However, data can be exchanged to the hard disk after version 2.0 (note that this feature is not supported after version 2.4!)
  • Master slave replication (see note 3)
  • Although simple data or hash tables indexed by key values are used, complex operations such as zrevrangebyscore are also supported.
  • Incr & CO (suitable for calculating limit values or Statistics)
  • Supports sets (also supports union / diff / inter)
  • Support list (also support queue; blocking pop operation)
  • Support hash tables (objects with multiple domains)
  • Support sorting sets (high score table, applicable to range query)
  • Redis supports transactions
  • Support setting data to expired data (similar to fast buffer design)
  • Pub / sub allows users to implement messaging mechanisms

2. Redis advantages

1) Very rich data structure;

2.) redis provides the transaction function, which can ensure the atomicity of a series of commands without being interrupted by any operation;

3.) the data is stored in the memory, and the reading and writing speed is very high, which can reach the frequency of 10W / s.

3. Redis disadvantages

1) The official cluster scheme came out after redis3.0, but there are still some architectural problems;

2) poor persistence function experience – if it is realized by snapshot method, the data of the whole database needs to be written to disk at regular intervals, which is very expensive; The AOF method only tracks the changed data, which is similar to the MySQL binlog method, but the additional log may be too large. At the same time, all operations must be performed again, and the recovery speed is slow;

3) Because it is a memory database, the amount of data stored by a single machine is related to the memory size of the machine itself. Although redis has its own key expiration strategy, it still needs to predict in advance and save memory. If the memory grows too fast, the data needs to be deleted regularly.

4. Redis application scenario:

Best application scenario: it is suitable for applications with fast data change and database size (suitable for memory capacity).

For example: microblog, data analysis, real-time data collection, real-time communication, etc.

reference resources

Summarize the essence of technology, and tell me what I did in the first half of the year.

Non professional programmers: how to obtain professional resources and enter a good company?

[Android] summary of an interview

Practice Chapter 24 of Java – factory methods you don’t know