Four types of NoSQL
The position of NoSQL database in the whole database field is self-evident. In the era of big data, although RDBMS is excellent, RDBMS is gradually unable to cope with many database processing tasks in the face of rapidly growing data scale and increasingly complex data models. At this time, NoSQL has successfully established its foothold in the database field by virtue of easy expansion, large data volume, high performance and flexible data models.
At present, it is generally agreed that NoSQL database can be divided into four categories: key value storage database, document database, column storage database and graphic database, each of which can solve the problem that relational data can not solve. In practical application, the classification boundary of NoSQL database is not so obvious, and it is often a combination of many types.
Details of mainstream NoSQL: mongodb, HBase, redis
Mongodb is a high-performance, open-source, modeless document database. The development language is C + +. It can be used in many scenarios to replace the traditional relational database or key / value storage.
1. Mongodb features
- Language: C++
- Features: keep some friendly features of SQL (query, index).
- License: AGPL (initiator: Apache)
- Protocol: custom, binary (bson)
- Master / slave replication (supports automatic error recovery and sets replication)
- Built in fragmentation mechanism
- Update in place support is better than CouchDB
- Memory to file mapping in data storage
- Focus on performance over function
- It is recommended to turn on the logging function (parameter — Journal)
- On 32-bit operating systems, the database size is limited to about 2.5Gb
- Empty database takes up about 192mb
- Using gridfs to store big data or metadata (not the real file system)
2. Advantages of mongodb:
1) With higher write load, mongodb has higher insertion speed.
2) Deal with large scale single table, when the data table is too large, it can easily split the table.
3) With high availability, setting M-S is not only convenient but also fast. Mongodb can realize node (Data Center) failover quickly, safely and automatically.
4) For fast query, mongodb supports two-dimensional spatial index, such as pipeline, so it can obtain data from specified location quickly and accurately. Mongodb will load the data in the database into memory in the form of file mapping after startup. If the memory resource is very rich, it will greatly improve the query speed of the database.
5) With the explosive growth of unstructured data, adding columns may lock the entire database in some cases, or increase the load, resulting in performance degradation. Due to mongodb’s weak data structure mode, adding a new field will not have any impact on the old table, and the whole process will be very fast.
3. Mongodb’s disadvantages:
1) Transaction is not supported.
2) Mongodb takes up too much space.
3) Mongodb does not have mature maintenance tools.
4. Mongodb application scenario
1) it is applicable to the requirements of real-time insertion, update and query, and has the replication and high scalability required by the application real-time data storage;
2) It is very suitable for storage and query in document format;
3.) high scalability scenario: mongodb is very suitable for databases composed of dozens or hundreds of servers.
4.) focus on performance over function.
HBase is a subproject of Apache Hadoop, which belongs to the open source version of BigTable, and the implemented language is Java (so it depends on Java SDK). HBase relies on HDFS (distributed file system) of Hadoop as the basic storage unit.
1. HBase features:
- Language: Java
- Features: support billions of rows and millions of columns
- License: Apache
- Protocol: http / rest (support thrift, see note 4)
- Modeling after BigTable
- Map / reduce with distributed architecture
- Optimize real-time queries
- High performance thrift gateway
- Pre judgment of query operation by scanning and filtering on the server side
- Supports HTTP for XML, protobuf, and binary
- Cascading, hive, and pig source and sink modules
- Jruby (jirb) – based shell
- Configuration changes and minor upgrades are rolled back
- No single point of failure
- Comparable to the random access performance of MySQL
3. HBase advantages
1) Large storage capacity, a table can hold hundreds of millions of rows, millions of columns;
2) it can be retrieved by version, and the required historical version data can be found;
3) when the load is high, the horizontal segmentation extension can be realized by simply adding machines. The seamless integration with Hadoop ensures its data reliability (HDFS) and high performance of massive data analysis (MapReduce);
4.) on the basis of the third point, it can effectively avoid the occurrence of single point fault.
4. HBase disadvantages
1. Implementation based on Java language and Hadoop architecture means that its API is more suitable for Java projects;
2. In node development environment, there are many dependencies, configuration problems (or how to configure, such as persistent configuration), and lack of documents;
3. It takes up a lot of memory, and because it is built on HDFS optimized for batch analysis, the read performance is not high;
4. API is relatively clumsy compared with other NoSQL.
5. HBase application scenarios
1) BigTable type data storage;
2) Version query is required for data;
3) To meet the requirements of large data volume, expand the simple requirements.
Redis is an open-source log and key value database written in ANSI C language, supporting network, memory based and persistent, and provides APIs in multiple languages. Currently, VMware is in charge of development.
1. Redis features:
- Language: C / C++
- Features: abnormal fast operation
- License: BSD
- Protocol: telnet like
- There are memory databases supported by hard disk storage,
- However, data can be exchanged to the hard disk after version 2.0 (note that this feature is not supported after version 2.4!)
- Master slave replication (see note 3)
- Although simple data or key indexed hash tables are used, complex operations, such as zrevrangebyscale, are also supported.
- Incr & CO (suitable for calculation of limit value or statistical data)
- Supports sets (also union / diff / inter)
- Support list (also support queue; blocking pop operation)
- Hash table support (objects with multiple domains)
- Support sorting sets (high score table, applicable to range query)
- Redis support transaction
- Supports setting data to expired data (similar to fast buffer design)
- Pub / sub allows users to implement message mechanism
2. Redis advantages
1) Very rich data structure;
2) redis provides transaction function, which can ensure the atomicity of a series of commands without any interruption in the middle;
3) the data is stored in memory, and the reading and writing speed is very high, which can reach the frequency of 10W / s.
3. Redis’s shortcomings
1) Redis3.0 is the official cluster solution, but there are still some architectural problems;
2) the experience of persistence function is not good – if the snapshot method is used, the data of the whole database needs to be written to disk every other period of time, which is very expensive; while AOF method only tracks the changed data, similar to MySQL binlog method, but the additional log may be too large, and all operations need to be performed again at the same time, so the recovery speed is slow;
3) Because it’s a memory database, the amount of data stored by a single machine is the same as the memory size of the machine itself. Although redis has its own key expiration policy, it still needs to predict and save memory in advance. If the memory grows too fast, you need to delete the data regularly.
4. Redis application scenario:
The best application scenario: it is suitable for applications with fast data changes and the database size can be met (suitable for memory capacity).
For example: microblog, data analysis, real-time data collection, real-time communication, etc.
Articles collected on the Internet