Tag:participle
-
Time:2021-1-16
Generally speaking, what we need to capture is the content of a website or an application to extract useful value. The content is generally divided into two parts, unstructured text or structured text. About structured data JSON、XML、HTML HTML text (including JavaScript code) is the most common data format. It should be a structured text organization, […]
-
Time:2021-1-7
introductionWith the development of social media, blogs, microblogs and social networks are quietly changing people’s way of life. Microblog, wechat, including tmall, Jingdong and other users are increasing day by day, and the number of microblogs or comments actively released by users is considerable. In this era of social media, users become the best brand […]
-
Time:2020-12-25
The original intention of inverted index Inverted index, which is also an index. Index, the original intention is to quickly retrieve the data you want. I believe you must know the index of MySQL. If you index a field, generally speaking, the query speed of that field can be significantly improved.Each kind of database has […]
-
Time:2020-12-24
I feel that there are too few Chinese articles about Django full-text search on the Internet, and what is said is not in place. It simply introduces how to configure it, but does not say that such configuration is of any use, so it is still very confused. So I hope this article can help […]
-
Time:2020-12-10
Generally, the first way to optimize the efficiency of database retrieval is to start with the index, and then consider more complex load balancing, read-write separation and distributed horizontal / vertical sub database / table according to the demand. The index improves the retrieval efficiency through information redundancy, which exchanges space for time and reduces […]
-
Time:2020-11-11
Write it at the front Recently, in optimizing the search part of the website, the website is implemented with Django. The main business is online video education website. Before, the search was only in the Django ORM modelicontainsFuzzy matching, so only search keywords, but CEO(SB)Suddenly I want to do a similar function of Baidu Q […]
-
Time:2020-11-7
Natural language intelligence (NLP) Natural language intelligence research realizes effective communication between human and computer by language. It is a science integrating linguistics, psychology, computer science, mathematics and statistics. It involves the analysis, extraction, understanding, transformation and production of natural language and formal language. Artificial intelligence can be divided into several stages • computational intelligence […]
-
Time:2020-10-14
Chinese util is a PHP Chinese toolkit, which supports the conversion of Chinese characters to Pinyin, Pinyin word segmentation, simple and complex conversion, number conversion, and amount number conversion. Due to the extensive and profound Chinese characters, there are multi tone characters, simplified Chinese characters and traditional Chinese characters have a variety of corresponding. And […]
-
Time:2020-9-23
Django version: 3.0.4Python package preparation: pip install django-haystack pip install jieba Use Jieba participle 1. CD to the haystack package in site packages, create and edit it ChineseAnalyzer.py file #(Note: PIP installed Django haystack, but the folder name of the actual package is haystack) cd /usr/local/lib/python3.8/site-packages/haystack/backends/ #Create and edit ChineseAnalyzer.py file vim ChineseAnalyzer.py 2. […]
-
Time:2020-9-17
There are only five steps: Start elasticsearch7.9 docker image integrated with IK Chinese word segmentation plug-in Laravel7 configuration scout Configure model model Import data search Demo address https://www.ar414.com Search scope Article content title label Result weight Number of keywords Keyword frequency Search page Highlight Word segmentation display Pagination of results preface The main reason is […]
-
Time:2020-9-2
The notes are reproduced in GitHub project:https://github.com/NLP-LOVE/Introduction-NLP 2. Dictionary segmentation Chinese word segmentation: refers to the process of splitting a piece of text into a series of words, and the sequence of these words is equal to the original text. Chinese word segmentation algorithms are roughly divided intoBased on dictionary rulesAndBased on machine learningThese two […]
-
Time:2020-9-1
The notes are reproduced in GitHub project:https://github.com/NLP-LOVE/Introduction-NLP 3. Binary grammar and Chinese word segmentation In the last chapter, we realized the dictionary segmentation which is not allowed to be disambiguated. Given two kinds of word segmentation results “goods and services” and “goods and services”, dictionary segmentation does not know which is more reasonable. We humans […]