On elasticsearch’s “love and hate”“
- Perhaps when it comes to search servers, most people will think of Solr, elasticsearch, and even the self-development of large domestic manufacturers. With the advent of the era of artificial intelligence and big data, a series of machine engines have even emerged, such as Splunk. Solr and elasticsearch are Lucene based search servers. Generally, Solr is for full-text search engine, while elastic search is for full-text search, structured search and analysis. For Splunk machine data engine, it can collect, index and utilize fast mobile computer data generated by all applications, servers and devices. However, no matter how the technology develops and changes, as a programmer, what we need to do is not only to maintain the technology reserve knowledge base and update our own technology cache in real time, but also to realize the growth of scalable technology depth tree.
- About elasticsearch, I remember when I came into contact with the term elasticsearch, it was the summer of 2017. At that time, the task was to implement a knowledge base system. At that time, the partners mainly chose Solr + Lucene for technology selection, and some even directly used the built-in functions of MySQL database. By accident, when I was inquiring about the technical practice of the search engine on the Internet, I saw an analysis of the application report on the application practice of elasticsearch, and then I went to inquire about the relevant information of elasticsearch. However, at that time, most of the demos about search function on the Internet were more about Solr. Perhaps at that time, most of the technical concepts were biased towards the factors of long-term stability of technology, complete documentation and relatively heavy use. However, I personally have a heart to try to fight elastic search.
- For the first time, hands-on operation is installed on the windows local machine (22G memory). Compared with Tomcat + Solr, the installation process is more complex, and the memory and power consumption of the machine are heavy. A development can only be said to be able to run basically, but in terms of stability, it is a bit daunting. The second time, the actual combat was to build a virtual machine (2-core 8G) on this machine. But in the aspect of network communication, the network bridging mode was chosen at that time, which made me feel very troublesome. The third time, I had my own Alibaba cloud server. According to the traditional deployment method (compared with docker deployment), I had no choice but to have a low memory (2-core 4G) on the personal server, modify the configuration, and the JVM could not be started successfully. I always threw out GC logs or something. The main problem was that I was short of money at that time. Even once elasticsearch service was running, other applications would not start Unable to start and run. Later, I contacted docker, so I had the fourth elastic search (single node deployment). The fourth time, we upgraded the configuration of Alibaba cloud server (2-core 8G), and finally realized our first elastic search service. Even, it lays the foundation for the later work to combat elasticsearch distributed cluster service.
- It seems that elasticsearch, like mongodb / redis / Memcache, is a NoSQL database in a sense. It is a near real-time search platform. There is only a slight delay from indexing this document to being searched. Enterprise application positioning: extensible and highly available full-text search tool for real-time data analysis based on restful API standard. However, at that time, elastic stack only had elastic search, kibana and logstash use cases, and did not include beats and so on. Moreover, in the aspect of application, in addition to being built as elk distributed log system, the elasticsearch + elasticsearch head plug-in can meet the needs of business scenarios, safely and reliably obtain data of any source and format, and then search, analyze and visualize the data in real time.
- Basic features:
- Expandable: support one master and multiple slaves, and easy expansion, as long as cluster.name It is consistent and can automatically join the current cluster in the same network. It is open source software and supports many open source third-party plug-ins
- High availability: distributed storage in multiple nodes of a cluster. Index supports shards and replication. Even if some nodes are down, data recovery and master-slave switching can be performed automatically
- Restful API standard is adopted: JSON format is used to operate data through HTTP interface
- The minimum unit of data storage is document, which is essentially a JSON text
- Node: node, a single server with elasticsearch service and providing fail over and scalability
- Cluster: cluster, an elastic search cluster is a server composed of one node or at least two nodes, which can serve and share node data together, and has the function of load balancing, even the high availability service based on zookeeper cluster.
- IndexIndex: a collection of document objects with the same or similar characteristics
- Type: type. A type type is defined for documents with the same filed field. Multiple index indexes can be created for a type
- Document: document, a document can be used as the basic information unit of index index
- Field: field column. Field is the smallest unit of elasticsearch, which is equivalent to a certain column of data
- Term: consists of many bytes. Generally, each minimum unit after the field value participle of text type is called term.
- ShardsElastic search divides the index into several parts, and each part is a shard partition
- Replicas: copy, the copy of each shard partition in each index, or data backup
Comparison of elasticsearch structure with other databases
- Comparison on data model
|sql||Mysql||Database database||Table table||Data row||Data column|
|Nosql||Elasticsearch||Index index||Type – type||Document||Field column – field|
|Nosql||Hbase||Namespace – namespace||Domain / slice region||Data row||Data column|
- Contrast in use scenarios
|sql||Mysql||Line number data storage, suitable for OLTP service||InnoDB engine support||Strong consistency – strong consistency||Single machine scalability is not high||support||support|
|Nosql||Elasticsearch||Index storage – any retrieval service||I won’t support it||Support configurable||Horizontal expansion||support||support|
|Nosql||Hbase||Column data storage, between OLTP and OLAP models||I won’t support it||Strong consistency strong consistency and time consistency temporal consistency||Horizontal expansion||I won’t support it||I won’t support it|
ps[ ⚠️ Note]:
- OLTP: online transaction processing (OLTP), mainly corresponding to the traditional relational database, the basic operation of adding, deleting, modifying and querying, emphasizes the consistency of transactions, such as banking system and e-commerce system.
- OLAP: online analytical processing (OLAP), which mainly corresponds to warehouse database, basically reads data, does complex data analysis, focuses on technical decision support, and provides intuitive and simple results.
Analysis of elasticsearch principle
- Gateway [storage format of index data]: elasticsearch is a file system used to store index data. It supports multiple types [local file system local file system shared file system distributed file system Hadoop HDFS, Amazon S3]
- Distributed Lucene directory [underlying API framework]: the underlying layer of elasticsearch relies on the Lucene framework, and each elasticsearch node service will have a corresponding Lucene framework
- Major module: on the upper layer of Lucene, index module, search module, mapping and river (a plug-in running inside elasticsearch cluster, which is mainly used to obtain heterogeneous data from outside, and then create index in elasticsearch)
- Discovery [elasticsearch discovery mechanism]: discovery is the mechanism of elasticsearch to automatically discover nodes; Zen is used to realize automatic node discovery and master node election; elasticsearch is a P2P based system, which first searches for existing nodes through broadcast mechanism, and then communicates between nodes through multicast protocol, and also supports peer-to-peer interaction
- Scripting [elasticsearch script execution function]: scripting is a script execution function. With this function, it is very convenient to process the queried data
- Plugins [elasticsearch plug-in mechanism]: elasticsearch integrates third-party plug-ins, such as elasticsearch IK word segmentation plug-in and elasticsearch SQL plug-in.
- Transport [elastic search transport mechanism]: the transport module supports thrift, memcached and HTTP, and uses HTTP transport by default
- JMX [elasticsearch Java based management framework]: the management framework of Java, which is used to manage elasticsearch applications
- Rstful style API: network communication based on netty, which interacts with elasticsearch cluster through rstful API
Copyright notice: This article is the original article of the blogger, following the relevant copyright agreement. If you want to reprint or share it, please attach the original source link and link source.