Method of Chinese search with laravel + elasticsearch

Time:2021-8-7

Elasticsearch

Elasticsearch is an open source search engine based on Apache Lucene (TM). Lucene can be considered as the most advanced, best performing and most powerful search engine library in the open source or proprietary domain.

However, Lucene is just a library. To play its powerful role, you need to use Java and integrate it into your application. Lucene is very complex. You need to know more about retrieval to understand how it works.

Elasticsearch is also written in Java and uses Lucene to build indexes and realize search functions, but its purpose is to make full-text search simple and hide the complexity of Lucene through a simple and coherent restful API.

However, elasticsearch is not only Lucene and full-text search engine, but also provides:

  • Distributed real-time file storage, each field is indexed and searchable
  • Distributed search engine for real-time analysis
  • It can be extended to hundreds of servers to process Pb level structured or unstructured data

Moreover, all these functions are integrated into one server, and your application can interact with it through simple restful APIs, clients in various languages, and even the command line. Getting started with elasticsearch is very simple. It provides many reasonable default values and hides the complex search engine theory for beginners. It is out of the box (ready to use after installation) and can be used in production environment with little learning.

Elasticsearch is licensed under the Apache 2 license and can be downloaded, used and modified free of charge.

Elasticsearch installation

Elasticsearch has been integrated in laradock. We can directly use:


docker-compose up -d elasticsearch

If you need to install the plug-in, execute the command:

docker-compose exec elasticsearch /usr/share/elasticsearch/bin/elasticsearch-plugin install {plugin-name}

//Restart container
docker-compose restart elasticsearch

Note:

The vm.max_map_count kernel setting must be set to at least 262144 for production use.

Since I am a CentOS 7 environment, I set it directly in the system settings:
sysctl -w vm.max_map_count=262144

Default user name and password: “elastic”, “change”, port number: 9200

ElasticHQ

ElasticHQ is an open source application that offers a simplified interface for managing and monitoring Elasticsearch clusters.

Management and Monitoring for Elasticsearch.

http://www.elastichq.org/

  • Real-Time Monitoring
  • Full Cluster Management
  • Full Cluster Monitoring
  • Elasticsearch Version Agnostic
  • Easy Install – Always On
  • Works with X-Pack

Enter our elasticsearch host to enter the background.

The default created:

A cluster: laradock cluster
A node: laradock node
An index:. Elastichq

IK word splitter installation

ElasticSearch is mainly used for search of blog or public articles. So a Chinese word segmentation device is needed to cooperate with the official account. The IK participle has been recommended here, and the following ElasticSearch plug-ins are installed.

https://github.com/medcl/elasticsearch-analysis-ik/releases

//Install plug-ins
docker-compose exec elasticsearch /usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.5.1/elasticsearch-analysis-ik-7.5.1.zip

Note: you can download the zip file first and then install it. The speed will be faster.

Test word segmentation effect

According to elasticsearch API test, the effect of word segmentation is as follows:

~ curl -X POST "http://your_host/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
 "analyzer": "ik_max_word",
 "Text": "I'm Chinese"
}
'

{
 "tokens" : [
  {
   "Token": "I",
   "start_offset" : 0,
   "end_offset" : 1,
   "type" : "CN_CHAR",
   "position" : 0
  },
  {
   "Token": "yes",
   "start_offset" : 1,
   "end_offset" : 2,
   "type" : "CN_CHAR",
   "position" : 1
  },
  {
   "Token": "Chinese",
   "start_offset" : 2,
   "end_offset" : 5,
   "type" : "CN_WORD",
   "position" : 2
  },
  {
   "Token": "China",
   "start_offset" : 2,
   "end_offset" : 4,
   "type" : "CN_WORD",
   "position" : 3
  },
  {
   "Token": "Chinese",
   "start_offset" : 3,
   "end_offset" : 5,
   "type" : "CN_WORD",
   "position" : 4
  }
 ]
}

Combined with laravel

Although elasticsearch officially provides the plug-in of the corresponding PHP version, we still want to combine it more closely with laravel, so we choose to use it in combination with scout. For details, we usetamayo/laravel-scout-elasticplug-in unit.


composer require tamayo/laravel-scout-elastic
 
composer require laravel/scout
 
php artisan vendor:publish

choice:Laravel\Scout\ScoutServiceProvider

Modify drive toelasticsearch


'driver' => env('SCOUT_DRIVER', 'elasticsearch'),

Create index

There are several ways to create an index, which can be created directly using the ELA visualization tool elastichq.

Next, we need to update the index and supplement the mappings section. You can use postman.

Another method is to use the artisan command line function of laravel.

Here we recommend using artisan command line.


php artisan make:command ESOpenCommand

According to the tips on the official website, we canESOpenCommandSend a put request to the elasticsearch server on. Here, we use the PHP plug-in provided by elasticsearchtamayo/laravel-scout-elasticThe elasticsearch PHP plug-in has been installed:

Now you can create our index with the help of the plug-in, and look at the code directly:


 public function handle()
  {
  $host = config('scout.elasticsearch.hosts');
  $index = config('scout.elasticsearch.index');
  $client = ClientBuilder::create()->setHosts($host)->build();

  if ($client->indices()->exists(['index' => $index])) {
    $this->warn("Index {$index} exists, deleting...");
    $client->indices()->delete(['index' => $index]);
  }

  $this->info("Creating index: {$index}");

  return $client->indices()->create([
    'index' => $index,
    'body' => [
      'settings' => [
        'number_of_shards' => 1,
        'number_of_replicas' => 0
      ],
      'mappings' => [
        '_source' => [
          'enabled' => true
        ],
        'properties' => [
          'id' => [
            'type' => 'long'
          ],
          'title' => [
            'type' => 'text',
            'analyzer' => 'ik_max_word',
            'search_analyzer' => 'ik_smart'
          ],
          'subtitle' => [
            'type' => 'text',
            'analyzer' => 'ik_max_word',
            'search_analyzer' => 'ik_smart'
          ],
          'content' => [
            'type' => 'text',
            'analyzer' => 'ik_max_word',
            'search_analyzer' => 'ik_smart'
          ]
        ],
      ]
    ]
  ]);
}

OK, we execute kibana and see that we have created the index:

Note kibana local docker installation:

Subsequent sessions will focus on how kibana is used


docker run -d --name kibana -e ELASTICSEARCH_HOSTS=http://elasticsearch_host -p 5601:5601 -e SERVER_NAME=ki.test kibana:7.5.2

To verify whether the index is available, insert a piece of data to see:

curl -XPOST your_host/coding01_open/_create/1 -H 'Content-Type:application/json' -d'
{"content": "investigation on the conflict between China and South Korea fishing police: South Korea police detain an average of one Chinese fishing boat per day"}

You can view the corresponding data through the browser:

With index, we can import, update, query and other operations in combination with laravel in the next step.

Laravel model uses

The laravel framework has recommended scout full-text search for us. We just need to add the official content to the article model. Very simple, we recommend you to see the Scout documentation: https://learnku.com/docs/laravel/6.x/scout/5191 , the following code is directly:

<?php

namespace App;

use App\Tools\Markdowner;
use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\SoftDeletes;
use Laravel\Scout\Searchable;

class Article extends Model
{
  use Searchable;

  protected $connection = 'blog';
  protected $table = 'articles';
  use SoftDeletes;

  /**
   * The attributes that should be mutated to dates.
   *
   * @var array
   */
  protected $dates = ['published_at', 'created_at', 'deleted_at'];

  /**
   * The attributes that are mass assignable.
   *
   * @var array
   */
  protected $fillable = [
    'user_id',
    'last_user_id',
    'category_id',
    'title',
    'subtitle',
    'slug',
    'page_image',
    'content',
    'meta_description',
    'is_draft',
    'is_original',
    'published_at',
    'wechat_url',
  ];

  protected $casts = [
    'content' => 'array'
  ];

  /**
   * Set the content attribute.
   *
   * @param $value
   */
  public function setContentAttribute($value)
  {
    $data = [
      'raw' => $value,
      'html' => (new Markdowner)->convertMarkdownToHtml($value)
    ];

    $this->attributes['content'] = json_encode($data);
  }

  /**
   *Get searchable data for the model
   *
   * @return array
   */
  public function toSearchableArray()
  {
    $data = [
      'id' => $this->id,
      'title' => $this->title,
      'subtitle' => $this->subtitle,
      'content' => $this->content['html']
    ];

    return $data;
  }

  public function searchableAs()
  {
    return '_doc';
  }
}

Scout provides the artisan command import to import all existing records into the search index.


php artisan scout:import "App\Article"

Look at kibana. 12 pieces of data have been stored, which is consistent with the number of database.

With the data, we can test to see if we can query the data.

Again, create a command:


class ElasearchCommand extends Command
{
  /**
   * The name and signature of the console command.
   *
   * @var string
   */
  protected $signature = 'command:search {query}';

  /**
   * The console command description.
   *
   * @var string
   */
  protected $description = 'Command description';

  /**
   * Create a new command instance.
   *
   * @return void
   */
  public function __construct()
  {
    parent::__construct();
  }

  /**
   * Execute the console command.
   *
   * @return mixed
   */
  public function handle()
  {
    $article = Article::search($this->argument('query'))->first();
    $this->info($article->title);
  }
}

This is my title. I randomly enter a keyword: “list” to see if I can find it.

summary

Overall completed:

  • Elasticsearch installation;
  • Elasticsearch IK word splitter plug-in installation;
  • Installation and simple use of elasticsearch visualization tools elastichq and kibana;
  • Use of scout;
  • Elasticsearch is used in conjunction with scout.

Next, more content will be stored in Elasticsearch to provide full text search for blog, official account, automated search and other scenarios.

reference resources

Recommend a command line application development tool – larevel zero

Artisan command line https://learnku.com/docs/laravel/6.x/artisan/5158

Scout full text search https://learnku.com/docs/laravel/6.x/scout/5191

How to integrate Elasticsearch in your Laravel App – 2019 edition https://madewithlove.be/how-to-integrate-elasticsearch-in-your-laravel-app-2019-edition/

Kibana Guide https://www.elastic.co/guide/en/kibana/index.html

elasticsearch php-api [https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/index.html](https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/index.html)

The above is the whole content of this article. I hope it will be helpful to your study, and I hope you can support developpaer.