Tag:corpus

  • Using EasyWeChat and ChatterBot to build a official account of “automatic recovery robot”

    Time:2021-9-10

    Since the revision of the official account list page, many people say that the role of the official account menu will be weakened. Moreover, for personal numbers, menu development cannot be operated in the development mode. So we simply “give up the menu” and make “automatic reply” to replace the menu function. To develop the […]

  • NLP: don’t rebuild the wheel

    Time:2021-6-8

    By abhijit GuptaCompile VKSource: towards Data Science introduce Natural language processing (NLP) is a daunting field name. Generating useful conclusions from unstructured text is difficult, and there are numerous techniques and algorithms, each with its own use cases and complexity. As a developer with minimal exposure to NLP, it’s hard to know which methods to […]

  • NLP practice notes text classification based on machine learning – (1) text representation

    Time:2020-12-18

    Text classification based on machine learning Knowledge points 1. Text representation One-hot: this representation cannot express the similarity between words `measurements = [ {‘city’: ‘Dubai’, ‘temperature’: 33.}, {‘city’: ‘London’, ‘temperature’: 12.}, {‘city’: ‘San Francisco’, ‘temperature’: 18.}, ] from sklearn.feature_extraction import DictVectorizer vec = DictVectorizer() vec.fit_transform(measurements).toarray() ` Bag of Words: bag representation, also known as count […]

  • I heard that Huawei cloud AI has a “chat officer”?

    Time:2020-10-9

    Abstract:Who is Hua Xiaowei? ——Chief “chat officer” of Huawei cloud AI. Who is Hua Xiaowei? The chief “chat officer” of Huawei cloud AI aims to show the public our dialogue AI ability, help the public better understand us, and also activate the atmosphere in various groups. You can find TA by paying attention to the […]

  • A survey on deep learning for named entity recognition (2020) reading notes

    Time:2020-9-27

    1. Summary This paper mainly introduces the resources of NER (NER corpus and tools), introduces the current work from three dimensions of distributed representation for input, context encoder and tag decoder, and investigates the most representative deep learning methods at present. Finally, the challenges and future research directions of NER system are proposed. 2. Introduction […]

  • A brief introduction to language models in natural language processing

    Time:2020-9-8

    By devyanshu ShuklaCompile | FlinSource | medium In this article, we’ll discuss everything about the language model (LM) What is LM Application of LM How to generate LM Evaluation of LM introduce The language model in NLP is to calculate the probability of a sentence (word sequence) or the probability of the next word in […]

  • Notes on hanlp’s introduction to natural language processing — 3. Binary grammar and Chinese word segmentation

    Time:2020-9-1

    The notes are reproduced in GitHub project:https://github.com/NLP-LOVE/Introduction-NLP 3. Binary grammar and Chinese word segmentation In the last chapter, we realized the dictionary segmentation which is not allowed to be disambiguated. Given two kinds of word segmentation results “goods and services” and “goods and services”, dictionary segmentation does not know which is more reasonable. We humans […]

  • Hidden Markov (HMM) / perceptron / conditional random field (CRF) — part of speech tagging

    Time:2020-8-24

    The notes are reproduced in GitHub project:https://github.com/NLP-LOVE/Introduction-NLP 7. Part of speech tagging 7.1 overview of part of speech tagging What is part of speech In linguistics,Part of speech(PAR of speech, POS) refers to the grammatical classification of words, also known as parts of speech. Words of the same category have similar grammatical properties, and the […]

  • 8. Hanlp implementation — named entity recognition

    Time:2020-8-23

    The notes are reproduced in GitHub project:https://github.com/NLP-LOVE/Introduction-NLP 8. Named entity recognition 8.1 general Named entity There are some words in the text that describe entities. For example, name of person, place name, organization name, stock fund, medical term, etcNamed entity。 It has the following common features: The number is infinite. For example, the naming of […]

  • Text generation (seq2seq)

    Time:2020-8-19

    Problem: generated according to the style of the specified text. For example, the romance of the Three Kingdoms How to achieve it? First of all, we need to understandLanguage model。 What is a language model? The language model is a given sequence to predict the probability distribution of the next token.It’s like cloze, but the […]

  • Theory and practice of named entity recognition

    Time:2020-7-24

    Mission objective Identify the entities with specific meaning in the text for downstream tasks, such as dialogue system, machine translation, building knowledge map, etc. Entity type Common entity definitions:Person name, place name, organization name, date, time, number, currency。 Custom type: as in legal provisionsLegal name, judge name, defendant, plaintiffThe entities in the medical field includeName […]

  • Part of Speech Tagging Based on noisy channel model and Viterbi algorithm

    Time:2020-6-27

    Given an English corpus, there are many sentences in it, and word segmentation has been done,/The word in front of it, the part of speech in the back, and each sentence is separated by a period, as shown in the figure below For a sentence s, every word in the sentence\(w_i\)Marked the corresponding part of […]