Word segmentation: that is to divide a paragraph of Chinese or other words into keywords. When searching, we will segment our own information, segment the data in the database or index library, and then perform a matching operation. The default Chinese word segmentation is to treat each word as a word. For example, "I love programming" will be divided into "I love", "programming" and "programming", This obviously does not meet the requirements, so we need to install a Chinese word splitter IK to solve this problem. If you use Chinese, it is recommended to use IK word splitter! IK provides two word segmentation algorithms: IK_ Smart and IK_ max_ Word, where IK_ Smart is the least segmentation, IK_ max_ Word is the most fine-grained division!
- 1. Download:github.com/medcl/elasticsearch-ana…
- 2. After downloading, put it into our elastic search plug-in
- 3. Restart and observe es, and you can see that IK word breaker is loaded!
- 4. Elastic search plugin can view the loaded plug-ins through this command
- 5. Test with kibana
View different word segmentation effects
ik_ Smart minimum segmentation
ik_ max_ Word is the most fine-grained division, exhausting the possibility of thesaurus!
Discovery problem: radian ghost is disassembled
This kind of word we need needs needs to be added to the dictionary of our word splitter
IK word splitter adds its own configuration
Loaded our custom DIC file
Test it again
In the future, we need to configure the word segmentation ourselves, so we can configure it in the custom DIC file!
This work adoptsCC agreement, reprint must indicate the author and the link to this article