Elaticsearch (I) — basic principle and usage

Time:2021-12-31

1、 Basic concepts

1. Introduction to elasticsearch

Lucene is a full-text (all text content is analyzed and indexed so that it can be searched) search engine Toolkit (the architecture of full-text search engine) written in Java language. It is used to process plain text data and provide interfaces such as indexing and search execution, but does not include distributed services.

Elasticsearch is a distributed search and analysis engine in near real time (the data added to ES can be retrieved in 1 second, and the visibility of this new data to the search is called “quasi real time search”). Lucene is used internally for indexing and search. Es distributed means that the cluster size can be adjusted dynamically and expanded flexibly. According to the official description, the cluster size supports “hundreds” of nodes. Therefore, at present, ES is considered to be suitable for businesses with medium data volume and not suitable for storing massive data.

Based on ES, you can easily build your own search engine to analyze logs, or establish a search engine in a vertical field. In addition, ES also provides a large number of aggregation functions, so it is not only a search engine, but also can carry out data analysis, statistics and generate index data.

2. Common concepts

2.1) index term

In elastic search, an index word (term) is an exact value that can be indexed. An index word (term) can be searched accurately through term query.

2.2 text

Text is an ordinary unstructured text. The text will be analyzed into index words one by one and stored in the index library of elastic search. In order to search the text, the text field needs to be analyzed in advance; When querying the keywords in the text, the search engine should search the original text according to the search conditions.

Extended: keyword. When storing data, it does not create an index by word segmentation.

2.3 analysis

Analysis is the process of converting text into index words, and the results of analysis depend on word segmentation. For example, foo bar, foo bar and foo bar may be analyzed into the same index words Foo and bar, which are stored in the index database of elasticsearch. When using foo: bar for full-text search, the search engine can also search the previous content in the index library according to the matching calculation. This is the search analysis of elasticsearch.

2.4 index

An index is a collection of documents with the same structure_ Index points to the logical namespace of one or more physical partitions. On the system, the name of the index is all lowercase. This name can be used to perform index, search, update and delete operations. Multiple indexes can be defined in a single cluster.

The ES index structure is shown in the figure

                  

Type

In an index, you can define one or more types, which are the logical partitions of the index. In general, a type is defined as a document with a set of common fields. For example, suppose you run a blog platform and store all the data in an index. In this index, you can define one type as user data, one type as blog data, and the other type as comment data.

2.5 document

A document is a JSON formatted string stored in elasticsearch. It’s like a row in a table in a relational database. Each document stored in the index has a type and an ID. each document is a JSON object that stores Zero or more fields or key value pairs.

Note: the original JSON document is stored in a_ In the field of source. This field is returned by default when searching for documents.

2.6 mapping

Mapping is like a table structure in a relational database. Each index has a mapping, which defines each field type in the index and the settings within an index range. A mapping can be defined in advance or automatically recognized when the document is first stored.

2.7 field

The document contains zero or more fields. The field can be a simple value (such as string, integer, date), or a nested structure of an array or object. The field is similar to the column of a table in a relational database. Each field corresponds to a field type, such as integer, string, object, etc. the field can also specify how to analyze the value of the field.

2.8) source field

By default, the original file will be stored in_ In the source field, this field is also returned during query. You can access the original object from the search results. This object returns an exact JSON string and does not display any other data after index analysis.

2.9) primary key (ID)

ID is the unique ID of a file. If no ID is provided during the repository, the system will automatically generate an ID, and the index / type / ID of the document must be unique.

2.10 mapping

Mapping is the process of defining how documents and the fields they contain are stored and indexed.

Using a mapping typically defines:

  • Which string fields should be considered full-text fields.
  • Which fields contain numbers, dates, or geographic locations.
  • Format of the date value.
  • Custom rules to control the mapping of dynamically added fields.

Each index has a mapping type that determines how the document will be indexed. (obsolete in version 6.0.0) the mapping type has:

(1) Meta field

The meta field is used to customize how the metadata related to the document is processed. Yuantian examples include documentation_index_type, _id, and_sourceField. Each document has metadata associated with it, such as_index、 mapping _typeand_idMeta field. When you create a mapping type, you can customize the behavior of some of its meta fields.

Document source meta field  _source The original JSON that represents the body of the document._size _sourcefrommapper-sizeThe size (in bytes) of the field provided by the plug-in.  

Indexing meta field    _field_namesThe document contains all fields with non null values._ignoredIn the document due toignore_malformed.  
Routing meta field    _routingCustom routing values for routing documents to specific tiles.

(2) Field or attribute

The mapping type contains a list of fields orpropertiesList of fields related to the document. Field data type each field has a data typetype

eg:

  • A simple type, such astextkeyworddatelongdoubleBoolean orip
  • A type that supports the hierarchical nature of JSON, such asobjectornested
  • Or special types, such asgeo_pointgeo_shape, orcompletion

Es supports indexing the same fields in different ways. For example, onestringFields can be indexed as atext,For full-text searchkeywordField, and a field for sorting or aggregation. This is the purpose of multi field. Most data types pass throughfieldsParameter supports multiple fields.

 

Supplement:

1) . settings to prevent mapping explosion

Defining too many fields in the index will lead to mapping explosion, resulting in out of memory errors and difficult recovery. For example, consider a situation where each new document you insert introduces a new field. Each time a document contains new fields, these fields will be appended to the mapping of the index. As the mapping grows, it may become a problem. The following settings limit the number of field mappings that can be created manually or dynamically to prevent mapping explosion caused by bad documents:

  index.mapping.total_fields.limitThe maximum number of fields in the index. Field and object mappings and field aliases are included in this limit. The default value is1000。 (the limitation is to prevent the mapping and search from becoming too large. Higher values can lead to performance degradation and memory problems, especially in clusters with high load or few resources. If you increase this setting, it is recommended to increase it as wellindices.query.bool.max_clause_countSetting, which limits the maximum number of Boolean clauses in a query.)

  index.mapping.depth.limitMaximum depth, measured by the number of internal objects. For example, if all fields are defined at the root object level, the depth is1。 If there is an object mapping, the depth is2Wait. Default to20。                      index.mapping.nested_fields.limitnestedThe maximum number of different mappings in the index. The default is50index.mapping.nested_objects.limitnestedThe maximum number of JSON objects of all nested types in a single document. The default is 10000.

  index.mapping.field_name_length.limitSets the maximum length of the field name. The default is long MAX_ Value (unlimited). This setting does not really solve the problem of mapping explosion, but it may still be useful if the field length is limited. This setting is usually not required. The default value is OK unless the user starts adding a large number of fields with very long names.

2) , mapping settings

2.1) once the field type in mapping is set, once data has been written, direct modification is prohibited, because the inverted index implemented by Lucene cannot be modified after generation (but fields can be added). You should re-establish a new index and then re index.

1. Newly added field

  • When dynamic is set to true, once a document with new fields is written, the mapping will be updated at the same time;
  • If dynamic is set to false, the mapping will not be updated, and the data of the new field cannot be indexed, but the information will appear in the_ Source;
  • Dynamic is set to strict, document writing failed.

2. If you want to modify the field type, you must use the reindex API to rebuild the index,Because if the data type of the field is modified, the indexed data cannot be searched,However, if it is a newly added field, it will not have such an impact.

 2.2)Dynamic Mapping 

  • When writing a document, if the index does not exist, the index will be automatically created;
  • Dynamic mapping mechanism makes it unnecessary to define mappings manually. Elasticsearch will automatically calculate the field type according to the document information;
  • If the automatic calculation is wrong, such as geographic location information, numerical type, etc., some functions will not be used (range, etc.).

 

2、 Common usage

1、Response filtering

All rest APIs accept onefilter_pathParameter, which can be used to reduce the response returned by elasticsearch. This parameter is a comma separated filter list, expressed in point notation:

eg1:

curl -X GET "localhost:9200/_search?q=elasticsearch&filter_path=took,hits.hits._id,hits.hits._score&pretty"

result

{
  "took" : 3,
  "hits" : {
    "hits" : [
      {
        "_id" : "0",
        "_score" : 1.6375021
      }
    ]
  }
}

eg2:

curl -X GET "localhost:9200/_cluster/state?filter_path=metadata.indices.*.stat*&pretty"
{
  "metadata" : {
    "indices" : {
      "twitter": {"state": "open"}
    }
  }
}

eg3:

curl -X GET "localhost:9200/_cluster/state?filter_path=routing_table.indices.**.state&pretty"
{
  "routing_table": {
    "indices": {
      "twitter": {
        "shards": {
          "0": [{"state": "STARTED"}, {"state": "UNASSIGNED"}]
        }
      }
    }
  }
}

eg4:

curl -X GET "localhost:9200/_count?filter_path=-_shards&pretty"
{
  "count" : 5
}

eg5:

curl -X GET "localhost:9200/_cluster/state?filter_path=metadata.indices.*.state,-metadata.indices.logstash-*&pretty"
{
  "metadata" : {
    "indices" : {
      "index-1" : {"state" : "open"},
      "index-2" : {"state" : "open"},
      "index-3" : {"state" : "open"}
    }
  }
}

eg6:

Elasticsearch sometimes returns the original value of the field directly, such as_sourceField. If you want to filter_sourceField should be considered_sourceParameters and are as followsfilter_pathParameter combination:

curl -X POST "localhost:9200/library/book?refresh&pretty" -H 'Content-Type: application/json' -d'
{"title": "Book #1", "rating": 200.1}
'
curl -X POST "localhost:9200/library/book?refresh&pretty" -H 'Content-Type: application/json' -d'
{"title": "Book #2", "rating": 1.7}
'
curl -X POST "localhost:9200/library/book?refresh&pretty" -H 'Content-Type: application/json' -d'
{"title": "Book #3", "rating": 0.1}
'
curl -X GET "localhost:9200/_search?filter_path=hits.hits._source&_source=title&sort=rating:desc&pretty"
{
  "hits" : {
    "hits" : [ {
      "_source":{"title":"Book #1"}
    }, {
      "_source":{"title":"Book #2"}
    }, {
      "_source":{"title":"Book #3"}
    } ]
  }
}

Note: & pretty means that the returned results are displayed in JSON format.

2. Enable stack trace

By default, elasticsearch does not include the stack trace of the error when the request returns an error. By willerror_ Trace inThe URL parameter is set totrue

  eg, by default, sending to the API is invalidsizeParameter time_search

curl -X POST "localhost:9200/twitter/_search?size=surprise_me&pretty"
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Failed to parse int parameter [size] with value [surprise_me]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Failed to parse int parameter [size] with value [surprise_me]",
    "caused_by" : {
      "type" : "number_format_exception",
      "reason" : "For input string: \"surprise_me\""
    }
  },
  "status" : 400
}

set uperror_trace=true:

curl -X POST "localhost:9200/twitter/_search?size=surprise_me&error_trace=true&pretty"
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Failed to parse int parameter [size] with value [surprise_me]",
        "stack_trace": "Failed to parse int parameter [size] with value [surprise_me]]; nested: IllegalArgumentException..."
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "Failed to parse int parameter [size] with value [surprise_me]",
    "stack_trace": "java.lang.IllegalArgumentException: Failed to parse int parameter [size] with value [surprise_me]\n    at org.elasticsearch.rest.RestRequest.paramAsInt(RestRequest.java:175)...",
    "caused_by": {
      "type": "number_format_exception",
      "reason": "For input string: \"surprise_me\"",
      "stack_trace": "java.lang.NumberFormatException: For input string: \"surprise_me\"\n    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)..."
    }
  },
  "status": 400
}

 

3. Document operation

(3.1) insert JSON document into index

  eg1:

(1) insert JSON document intotwitterIndex, index_idThe value is 1:

curl -X PUT "localhost:9200/twitter/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
'
"_shards" : {
        "total" : 2,
        "failed" : 0,
        "successful" : 2
    },
    "_index" : "twitter",
    "_type" : "_doc",
    "_id" : "1",
    "_version" : 1,
    "_seq_no" : 0,
    "_primary_term" : 1,
    "result" : "created"
}

(2) if there is no document with the ID, use the ID_createResources index documents totwitterIn index:

curl -X PUT "localhost:9200/twitter/_create/1?pretty" -H 'Content-Type: application/json' -d'
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
'

(3) if there is no document with this ID, setop_typewantestablishParameter to index the document totwitterIn index:

curl -X PUT "localhost:9200/twitter/_doc/1?op_type=create&pretty" -H 'Content-Type: application/json' -d'
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}
'

eg2:_idfromtwitterRetrieve the JSON document for 0 from the index

curl -X GET "localhost:9200/twitter/_doc/0?pretty"
{
    "_index" : "twitter",
    "_type" : "_doc",
    "_id" : "0",
    "_version" : 1,
    "_seq_no" : 10,
    "_primary_term" : 1,
    "found": true,
    "_source" : {
        "user" : "kimchy",
        "date" : "2009-11-15T14:12:12",
        "likes": 0,
        "message" : "trying out Elasticsearch"
    }
}

Check whether_idThere are documents with 0:

curl -I "localhost:9200/twitter/_doc/0?pretty"

Elasticsearch returns200 - OKWhether there is a status code for the document404 - Not Found

 

(3.2) delete document

FromtwitterTo delete a JSON document from the index:

curl -X DELETE "localhost:9200/twitter/_doc/1?pretty"
{
    "_shards" : {
        "total" : 2,
        "failed" : 0,
        "successful" : 2
    },
    "_index" : "twitter",
    "_type" : "_doc",
    "_id" : "1",
    "_version" : 2,
    "_primary_term": 1,
    "_seq_no": 5,
    "result": "deleted"
}

(3.2.1) routing

If you use a route when indexing, you also need to specify a route value to delete the document.If_routingMapping set torequiredIf no routing value is specified, the delete API will throwRoutingMissingExceptionAnd reject the request.

curl -X DELETE "localhost:9200/twitter/_doc/1?routing=kimchy&pretty"

Supplement: ES routing mechanism

The routing mechanism of elasticsearch is through hash algorithm, Place documents with the same hash value into the same main slice (default routing algorithm: map the document’s ID value to the corresponding primary partition based on its hash. This algorithm will basically maintain an average distribution of all data on all partitions without generating data hotspots.) it is similar to load balancing through the hash algorithm. The purpose of specifying the route is to store the document on the corresponding partition.

(3.2.2) timeout

When performing a delete operation, the primary partition assigned to perform the delete operation may not be available. Some of the reasons for this may be that the primary shard is currently recovering from storage or relocating. By default, the delete operation will wait for the primary partition to become available for up to 1 minute before it fails and responds to an error. ShouldtimeoutParameter can be used to specify the waiting time explicitly. The following is an example of setting it to 5 minutes:

curl -X DELETE "localhost:9200/twitter/_doc/1?timeout=5m&pretty"

(3.2.3) delete the document matching the specified query

curl -X POST "localhost:9200/twitter/_delete_by_query?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "message": "some message"
    }
  }
}
'

(3.2.4) fromtwitterDelete all tweets from the index

curl -X POST "localhost:9200/twitter/_delete_by_query?conflicts=proceed&pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_all": {}
  }
}
'

(3.2.5) delete documents from multiple indexes

curl -X POST "localhost:9200/twitter,blog/_delete_by_query?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_all": {}
  }
}
'

(3.2.6) restrict deletion by query to fragments with specific routing values

curl -X POST "localhost:9200/twitter/_delete_by_query?routing=1&pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range" : {
        "age" : {
           "gte" : 10
        }
    }
  }
}
'

By default_delete_by_queryUse a rolling batch of 1000. (use)scroll_size URL parameter (change batch size)

curl -X POST "localhost:9200/twitter/_delete_by_query?scroll_size=5000&pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "user": "kimchy"
    }
  }
}
'

 

(3.3) update documents

Added a new field to an existing document:

curl -X POST "localhost:9200/test/_update/1?pretty" -H 'Content-Type: application/json' -d'
{
    "doc" : {
        "name" : "new_name"
    }
}
'

 

4. Cache

By default, the clear cache API clears all caches. The clear execution cache can be specified by setting the following parameters:

  • fielddata
  • query
  • request
curl -X POST "localhost:9200/twitter/_cache/clear?fielddata=true&pretty"
curl -X POST "localhost:9200/twitter/_cache/clear?query=true&pretty"
curl -X POST "localhost:9200/twitter/_cache/clear?request=true&pretty"

Clear the cache for specific fields only, usingfieldsQuery parameters.

curl -X POST "localhost:9200/twitter/_cache/clear?fields=foo,bar&pretty"

Clear cache for multiple indexes

curl -X POST "localhost:9200/kimchy,elasticsearch/_cache/clear?pretty"

Clear cache for all indexes

curl -X POST "localhost:9200/_cache/clear?pretty"
 
5. Index operation
(5.1) create index 
Each index created can have specific settings associated with it, eg:

curl -X PUT "localhost:9200/twitter?pretty" -H 'Content-Type: application/json' -d'
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3, 
            "number_of_replicas" : 2 
        }
    }
}
'

The default is number_ of_ Shards1 defaults to number_ of_ Replicas1 (i.e. one copy per master partition).

Or more simplified
curl -X PUT "localhost:9200/twitter?pretty" -H 'Content-Type: application/json' -d'
{
    "settings" : {
        "number_of_shards" : 3,
        "number_of_replicas" : 2
    }
}
'

The create index API allows you to provide mapping definitions:

curl -X PUT "localhost:9200/test?pretty" -H 'Content-Type: application/json' -d'
{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "properties" : {
            "field1" : { "type" : "text" }
        }
    }
}
'

At 7.0 Before 0, the mapping definition was used to contain the type name. Although it is not recommended to specify the type in the request now, if the request parameter include is set_ type_ Name can still provide the type. The create index API also allows you to provide a set of aliases:

curl -X PUT "localhost:9200/test?pretty" -H 'Content-Type: application/json' -d'
{
    "aliases" : {
        "alias_1" : {},
        "alias_2" : {
            "filter" : {
                "term" : {"user" : "kimchy" }
            },
            "routing" : "kimchy"
        }
    }
}
'

By default, index creation only returns a response to the client when the primary copy of each shard starts or the request times out.

Index creation response:

{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "test"
}

acknowledgedIndicates whether the index was successfully created in the cluster, andshards_acknowledgedIndicates whether the required number of shard copies were started for each shard in the index before the timeout.

be careful,acknowledgedOrshards_acknowledgedyesfalse, but the index was created successfully. These values only indicate whether the operation completed before the timeout. Ifacknowledgedyesfalse, which means that we timed out before updating the cluster state with the newly created index, but it may be created soon. Ifshards_acknowledgedYesfalse, which means that we timeout before starting the required number of shards (only the primary shard by default), even if the cluster status has been successfully updated to reflect the newly created index (i.eacknowledged=true)。
We can change the default by only waiting for the main partition to start through the index settingindex.write.wait_for_active_shards(changing this setting will also affectwait_for_active_shardsValues for all subsequent writes):

curl -X PUT "localhost:9200/test?pretty" -H 'Content-Type: application/json' -d'
{
    "settings": {
        "index.write.wait_for_active_shards": "2"
    }
}
'

Or by requesting parameterswait_for_active_shards

curl -X PUT "localhost:9200/test?wait_for_active_shards=2&pretty"
 
(5.2) delete index

curl -X DELETE "localhost:9200/twitter?pretty"

1) Delete index alias

An index alias is a secondary name used to reference one or more existing indexes. Most elasticsearch APIs accept index aliases instead of index names.

curl -X DELETE "localhost:9200/twitter/_alias/alias1?pretty"

 

(5.3) refresh index
1) Refresh a specific index
curl -X POST "localhost:9200/kimchy/_flush?pretty"

2) Refresh partial index

curl -X POST "localhost:9200/kimchy,elasticsearch/_flush?pretty"

3) Refresh all indexes

curl -X POST "localhost:9200/_flush?pretty"
4) Get index

curl -X GET "localhost:9200/twitter?pretty"

be careful:

At 7.0 Before 0, the mapping definition was used to contain the type name. Although the mapping in the response no longer contains the type name by default, you can still use the parameter include_ type_ Name requested the old format. Get all aliases

curl -X GET "localhost:9200/logs_20302801/_alias/*?pretty"
{
 "logs_20302801" : {
   "aliases" : {
    "current_day" : {
    },
     "2030" : {
       "filter" : {
         "term" : {
           "year" : 2030
         }
       }
     }
   }
 }
}
 

(5.5) index alias operation

(5.5.1) create or update index aliases

An index alias is a secondary name used to reference one or more existing indexes (most elasticsearch APIs accept an index alias instead of an index name).

curl -X PUT "localhost:9200/twitter/_alias/alias1?pretty"

Eg: request2030bylogs_20302801Index creation alias

curl -X PUT "localhost:9200/logs_20302801/_alias/2030?pretty"

Add user based alias

eg:

(1) Create an indexusers, withuser_idField mapping

curl -X PUT "localhost:9200/users?pretty" -H 'Content-Type: application/json' -d'
{
    "mappings" : {
        "properties" : {
            "user_id" : {"type" : "integer"}
        }
    }
}
'

(2) Add index aliases for specific users,user_12

curl -X PUT "localhost:9200/users/_alias/user_12?pretty" -H 'Content-Type: application/json' -d'
{
    "routing" : "12",
    "filter" : {
        "term" : {
            "user_id" : 12
        }
    }
}
'

 (5.5.2) add alias when creating index

Eg: use the create index API} to add index aliases during index creation

curl -X PUT "localhost:9200/logs_20302801?pretty" -H 'Content-Type: application/json' -d'
{
    "mappings" : {
        "properties" : {
            "year" : {"type" : "integer"}
        }
    },
    "aliases" : {
        "current_day" : {},
        "2030" : {
            "filter" : {
                "term" : {"year" : 2030 }
            }
        }
    }
}
'
(5.5.3) get a specific alias

curl -X GET "localhost:9200/_alias/2030?pretty"
{
  "logs_20302801" : {
    "aliases" : {
      "2030" : {
        "filter" : {
          "term" : {
            "year" : 2030
          }
        }
      }
    }
  }
}

(5.5.4) obtain alias according to wildcard

curl -X GET "localhost:9200/_alias/20*?pretty"
{
  "logs_20302801" : {
    "aliases" : {
      "2030" : {
        "filter" : {
          "term" : {
            "year" : 2030
          }
        }
      }
    }
  }
}
 
(5.4) get index
1) Get multiple indexes

curl -X GET "localhost:9200/twitter,kimchy/_settings?pretty"
curl -X GET "localhost:9200/_all/_settings?pretty"
curl -X GET "localhost:9200/log_2013_*/_settings?pretty"

2) You can use wildcards to match the settings returned by filteringeg:

curl -X GET "localhost:9200/log_2013_-*/_settings/index.number_*?pretty"

3) Get multiple index templates

curl -X GET "localhost:9200/_template/template_1,template_2?pretty"

4) Get index template using wildcard expression

curl -X GET "localhost:9200/_template/temp*?pretty"

5) Get all index templates

curl -X GET "localhost:9200/_template?pretty"

6) Get multiple index mappings

curl -X GET "localhost:9200/twitter,kimchy/_mapping?pretty"

be careful:

Get the mapping of all indexes and types. The following two examples are equivalent

 curl -X GET “localhost:9200/_all/_mapping?pretty”
 curl -X GET “localhost:9200/_mapping?pretty”

 
6、Elasticsearch SQL
Using DSL statementsCreate index
curl -X PUT "localhost:9200/library/book/_bulk?refresh&pretty" -H 'Content-Type: application/json' -d'
{"index":{"_id": "Leviathan Wakes"}}
{"name": "Leviathan Wakes", "author": "James S.A. Corey", "release_date": "2011-06-02", "page_count": 561}
{"index":{"_id": "Hyperion"}}
{"name": "Hyperion", "author": "Dan Simmons", "release_date": "1989-05-26", "page_count": 482}
{"index":{"_id": "Dune"}}
{"name": "Dune", "author": "Frank Herbert", "release_date": "1965-06-01", "page_count": 604}
'

Execute SQL using SQL rest API:

curl -X POST "localhost:9200/_sql?format=txt&pretty" -H 'Content-Type: application/json' -d'
{
    "query": "SELECT * FROM library WHERE release_date < \u00272000-01-01\u0027"
}
'

Return results

    author     |     name      |  page_count   | release_date
---------------+---------------+---------------+------------------------
Dan Simmons    |Hyperion       |482            |1989-05-26T00:00:00.000Z
Frank Herbert  |Dune           |604            |1965-06-01T00:00:00.000Z
 
7. Inquiry

Structured query (query DSL): when querying, the query criteria will be compared first, then the score will be calculated, and finally the document results will be returned;

Structured filtering (filter DSL): the filter caches the query results, does not calculate the correlation, avoids calculating the score, and has a very fast execution speed (recommended);

1) Structured filter (DSL) [6 Filters]

Term filtering: term is mainly used to accurately match which values, such as number, date, Boolean value or not_ Analyzed string (text data type without analysis), equivalent to SQL age = 26

{ "term": { "age": 26 }}
{ "term": { "date": "2014-09-01" }}

Terms filtering: terms allows you to specify multiple matching criteria. If a field specifies multiple values, the document needs to match together. Equivalent to SQL: in query

{"terms": {"age": [26, 27, 28]}}

Range filtering: range filtering allows us to find a batch of data according to the specified range, which is equivalent to SQL between

{
    "range": {
        "price": {
            "gte": 2000,
            "lte": 3000
        }
    }
}
GT: greater than
LT: less than
GTE: greater than or equal to
LTE: less than or equal to

Exists and missing filtering: exists and missing filtering can be used to find whether a document contains a specified field or does not have a field, similar to is in an SQL statement_ Null condition

{
    "exists": {
        "field": "title"
    }
}

Bool filtering: Boolean logic used to merge query results of multiple filter criteria;

Must: exact matching of multiple query criteria, equivalent to and.

must_ Not: opposite matching of multiple query criteria, equivalent to not;

Should: at least one query condition matches, equivalent to or; Equivalent to SQL and or

{
    "bool": {
        "must": {
            "term": {
                "folder": "inbox"
            }
        },
        "must_not": {
            "term": {
                "tag": "spam"
            }
        },
        "should": [{
            "term": {
                "starred": true
            }
        },
        {
            "term": {
                "unread": true
            }
        }]
    }
}

 

2) Structured query (DSL)

Bool query: similar to bool filtering, bool query is used to merge multiple query clauses. The difference is that bool filtering can directly give whether the match is successful, and bool query needs to calculate the value of each query clause_ score

{
    "bool": {
        "must": {
            "match": {
                "title": "how to make millions"
            }
        },
        "must_not": {
            "match": {
                "tag": "spam"
            }
        },
        "should": [{
            "match": {
                "tag": "starred"
            }
        },
        {
            "range": {
                "date": {
                    "gte": "2014-01-01"
                }
            }
        }]
    }
}

Bool nested query

{
    "bool": {
        "should": [{
            "term": {
                "productID": "KDKE-B-9947-#kL5"
            }
        },
        {
            "bool": {
                "must": [{
                    "term": {
                        "productID": "JODL-X-1937-#pV7"
                    }
                },
                {
                    "term": {
                        "price": 30
                    }
                }]
            }
        }]
    }
}

match_ All query: using match_ All can query all documents. It is the default statement without query conditions.

{
  "match_all": {}
}

Match query: match query is a standard query, which is basically used whether you need full-text query or precise query. If you use match to query a full-text field, it will analyze the match query characters with the analyzer before the real query

{
    "match": {
        "tweet": "About Search"
    }
}

multi_ Match query: multi_ Match query allows you to search multiple fields at the same time based on match query

{
    "multi_match": {
        "query": "full text search",
        "fields": ["title",
        "body"]
    }
}

match_ Phrase: phrase query. Full text search is a phrase, which means that the positions of three words are continuous and sequential

{
    "match_phrase": {
        "title": "full text search",
        
    }
}

Set slop phrase interval

{
    "match_phrase": {
        "title": {
            "query": "full text search",
            "slop": 1
        }
    }
}

phrase_ Prefix query: prefix matching with the last entry in the phrase.

{
    "query": {
        "match_phrase_prefix": {
            "title": {
                "Query": "intelligence transmission"
            }
        }
    },
    "from": 0,
    "size": 5
}

Regexp query: wildcard query

{
    "query": {
        "regexp": {
            "title": "W[0-9].+"
        }
    }
}

Filter query: query statements and filter statements can be placed in their respective contexts. Filtered has been deprecated and replaced by bool

{
    "query": {
        "bool": {
            "must": {
                "match": {
                    "text": "quick brown fox"
                }
            },
            "filter": {
                "term": {
                    "status": "published"
                }
            }
        }
    }"from": 0,
    #"Size" from 0: 10,
    #Number of "sort" displayed:{
        "publish_date": {
            "order": "desc"
        }
    }
}

 

 

Official document address: https://www.elastic.co/guide/en/elasticsearch/reference/7.4/elasticsearch-intro.html

Thank you for reading, if you need to reprint, please indicate the source, thank you! https://www.cnblogs.com/huyangshu-fs/p/11683905.html

Recommended Today

How to use the search engine correctly to find the desired content

It mainly introduces the use skills of Google and Baidu search engines Google The premise is that you can access the Internet Exact search: double quotation marks Precise search is to add double quotation marks to the search keywords, which is equivalent to accurately matching keywords.For example, search: “front end open source project” Site search: […]