Tag:Tokenizer

  • Those things of memory swallowing beast — Architecture & three high guarantees

    Time:2021-9-30

    Series catalog What happened to the memory swallowing beast — get to know it Those things of memory elasticsearch — data structure and clever algorithm Those things of memory swallowing beast — Architecture & three high guarantees Those things about the memory swallowing beast — the principle of writing and retrieval Those things about memory […]

  • Elasticsearch project actual combat, product search function design and implementation!

    Time:2021-9-30

    Springboot e-commerce project Mall (30K + star) address: https://github.com/macrozheng/mall abstract Last time, I wrote an article “quick start to elasticsearch, just master these!” to take you to learn the basic usage of elasticsearch. This time, we will write a practical tutorial tomallTake commodity search in the project as an example and use elasticsearch! Chinese word […]

  • Elasticserach learning record (I)

    Time:2021-9-17

    Note that all versions of these things should be the same Elasticsearch installation JDK at least 1.8 elasticsearch client, interface tools be careful The number of bits of JDK must be the same as that of CPU, or JNA error will be reported Download address https://www.elastic.co/cn/Shearch and kibabn https://github.com/medcl/elas…IK participle https://github.com/mobz/elast…Head plug-in download These are […]

  • IK word breaker plug-in

    Time:2021-8-13

    What is an IK word breaker Word segmentation: that is to divide a paragraph of Chinese or other words into keywords. When searching, we will segment our own information, segment the data in the database or index library, and then perform a matching operation. The default Chinese word segmentation is to treat each word as […]

  • Solr8 imports data from MySQL 8.0.20

    Time:2021-8-9

    Modify the solrconfig.xml file modifysolrhome\demo_core\confAdd dataimport to solrconfig.xml under the folder To facilitate maintenance, we will add the following content at the beginning of requesthandler, about 720 lines <requestHandler name=”/dataimport” class=”org.apache.solr.handler.dataimport.DataImportHandler”> <lst name=”defaults”> <str name=”config”>data-config.xml</str> </lst> </requestHandler> Create data-config.xml Function of data-config.xml: database connection related information, SQL and query results are mapped in the corresponding […]

  • Managed schema file details

    Time:2021-8-8

    Managed schema file details Find the configuration file under the directory of the configured core admin. This issolrhome\demo_core\conf\managed-schemaIn the managed-schema.xml file, the solrcore data information is mainly configured, including the definitions of field and filedtype. In Solr, both field and fieldtype need to be defined before use. Field details: name: specify the name of the […]

  • Solr8.6.2 client interface introduction and configuration of Chinese word splitter

    Time:2021-8-7

    Start Solr servicevisithttp://localhost:8080/solr/index.html Dashboard: dashboard, which displays the start time, version, system resources, JVM and other information of the Solr instance. Logging: displays exceptions or errors in Solr operation Core Admin: the management interface of Solr core. You can add solrcore instances. There are mainly add core, unload, rename, reload and optimize Add core is […]

  • Elasticsearch search search summary

    Time:2021-3-15

    Elastic search (ES) is a near real-time distributed search and analysis engine. This article sorts out and filters es related information, including index, word segmentation, multi condition query, aggregation, automatic completion, suggestion words, synonyms, security, etc., which is convenient for you to learn and use es search engine.   Elasticsearch brief introduction Elasticsearch(ES)It’s based onLucene […]

  • Chinese word segmentation service based on Rust

    Time:2021-2-16

    1. Chinese word segmentation Chinese word segmentation, simple understanding is to divide a sentence into several words. The definition in Baidu Encyclopedia is to divide a Chinese character sequence into individual words. Word segmentation is the process of recombining continuous word sequences into word sequences according to certain norms. For example: I am Chinese and […]

  • [caricature] inverted index and word segmentation

    Time:2020-12-25

    The original intention of inverted index Inverted index, which is also an index. Index, the original intention is to quickly retrieve the data you want. I believe you must know the index of MySQL. If you index a field, generally speaking, the query speed of that field can be significantly improved.Each kind of database has […]

  • Implementation of custom Chinese full text index in neo4j

    Time:2020-12-10

    Generally, the first way to optimize the efficiency of database retrieval is to start with the index, and then consider more complex load balancing, read-write separation and distributed horizontal / vertical sub database / table according to the demand. The index improves the retrieval efficiency through information redundancy, which exchanges space for time and reduces […]

  • Build an open source project 13 – install IK word breaker and zookeeper

    Time:2020-11-27

    1、 Installing the IK word breaker Download IK word breaker plug-in wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.4.2/elasticsearch-analysis-ik- Using Linux to download will be very slow, so I went to GitHub and downloaded it in advance. Now I will start to install it [[email protected] ~]# mkdir /opt/elasticsearch/elasticsearch-6.4.2/plugins/elasticsearch-analysis-ik-6.4.2 [[email protected] ~]# cd /opt/elasticsearch/elasticsearch-6.4.2/plugins/elasticsearch-analysis-ik-6.4.2 [[email protected] elasticsearch-analysis-ik-6.4.2]# unzip elasticsearch-analysis-ik-6.4.2.tar.gz Decompression means that the IK […]