How to solve the problem of naive Bayesian classification

Time:2021-3-22

Public number: manong charging station Pro
Home page:https://codeshellme.github.io

Bayes principle is a British mathematicianThomas BayesIt was proposed in the 18th century that when we can’t directly calculate the possibility of the occurrence of a thing (a), we can indirectly calculate the possibility of the occurrence of the things (x, y, z) related to this thing, so as to indirectly judge the possibility of the occurrence of thing (a).

在这里插入图片描述

Before introducing Bayesian principle, we first introduce several concepts related to probability.

1. Probability related concepts

probabilityUsed to describe the probability of an event, using mathematical symbolsP(x)express,xexpressrandom variableP(x)expressxAnd the probability of that.

random variableAccording to whether the variable value is continuous, it can be divided intoDiscrete random variableandContinuous random variable

joint probabilityIt is determined by multiple random variables, and is determined byP(x, y)“Event” means “event”xAnd eventsyThe probability of simultaneous occurrence.

conditional probabilityIt is also determined by multiple random variablesP(x|y)“In event” means “in event”yOn the premise of occurrence, the eventxThe probability of occurrence. “

Marginal probability: fromP(x, y)DerivedP(x)To ignoreyVariable.

  • For discrete random variables, the joint probability is usedP(x, y)stayyupperSum up, can be obtainedP(x)It’s hereP(x)It’s edge probability.
  • For continuous random variables, the joint probability is usedP(x, y)stayyupperIntegral, can be obtainedP(x)It’s hereP(x)It’s edge probability.

probability distribution: show all the possible values of the random variable and their corresponding probabilities, and you can get the value of the variableprobability distributionThe probability distribution can be divided into two types: discrete type and continuous type.

commonDiscrete data distribution modelyes:

  • Bernoulli distribution: the distribution of a single random variable with only two values, 0 or 1. For example, the probability distribution of coin tossing (regardless of the coin upright) is Bernoulli distribution. The mathematical formula is as follows:
    • P(x = 0) = 1 – λ
    • P(x = 1) = λ
  • Polynomial distribution: also known as classification distribution, describes a single random variable with k different states. Here K is a finite number. If K is 2, it becomes a Bernoulli distribution.
    • P(x = k) = λ
  • Binomial distribution
  • Poisson distribution

commonContinuous data distribution modelyes:

  • Normal distribution, also known as Gaussian distribution, is the most important one.
  • uniform distribution
  • exponential distribution
  • Laplace distribution

The mathematical formula of normal distribution is as follows

在这里插入图片描述

The distribution of normal distribution is as follows

在这里插入图片描述

The normal distribution can also be divided into three types

  • Monistic normal distribution: at this timeμIs 0,σIt is 1.
  • Multivariate normal distribution.

Mathematical expectationIf we regard “the probability of each random result” asweightThen expectation is the result of allweighted average

varianceThe smaller the variance is, the smaller the deviation is. The larger the variance is, the larger the deviation is.

probability theoryWhat we study is the transformation relationship between these probabilities.

2. Bayes theorem

Bayesian formula is as follows:

在这里插入图片描述

meaning:

  • The right part of the equal sign,P(Bi)byPrior probabilityP(A|Bi)byconditional probability
  • The whole denominator to the right of the equal sign isMarginal probability
  • To the left of the equal signP(Bi|A)byPosterior probabilityFrom prior probability, conditional probability and edge probability.

Bayes theorem can be used in classification problems. When it is used in classification problems, the above formula can be simplified as:

在这里插入图片描述

Among them:

  • C represents a classification and f represents the attribute value.
  • P (C | f) is the probability that the sample belongs to category C when the attribute value f appears in the sample to be classified.
  • P (f | C) is the probability of the occurrence of attribute f in classification C, which is obtained from the statistics of training sample data.
  • P (c) is the probability of classification C appearing in training data.
  • P (f) is the probability of attribute f appearing in training samples.

This means that when we know some attribute eigenvalues, according to this formula, we can calculate the probability of the classification, which classification will be divided into which classification will have the highest probability, which completes a classification problem.

Bayesian inference

Let’s see how the Bayesian formula is derived.

As shown in the figure below, there are two ellipses, C on the left and F on the right.
在这里插入图片描述

Now let the two ellipses intersect

在这里插入图片描述

It can be seen from the above figure that under the condition of event F, the probability of event C isP(C ∩ F) / P(F)That is:

  • P(C | F) = P(C ∩ F) / P(F)

The results are as follows

  • P(C ∩ F) = P(C | F) * P(F)

In the same way:

  • P(C ∩ F) = P(F | C) * P(C)

So:

  • P(C ∩ F) = P(C | F) * P(F) = P(F | C) * P(C)
  • P(C | F) = P(F | C) * P(C) / P(F)

3. Naive Bayes

Let’s say we have a data set that we want to use Bayesian theorem to classify. There are two characteristics: F1, F2. Now let’s look at the dataFFor classification, we need to solve:

  • P(c|F): represents dataFIt belongs to classificationcAnd the probability of that.

Because there are characteristicsf1Andf2Then:

  • P(c|F) = P(c|(f1,f2))

For classification problems, there is often more than one feature. If features interact with each other, that isf1Andf2They interact with each other, thenP(c|(f1,f2))It’s not easy to solve.

Naive Bayes makes a simple and crude assumption on the basis of Bayes, which assumes that multiple features do not affect each other and are independent of each other.

simpleIt meansSimple, simple

It is expressed by mathematical formula

  • P(A, B) = P(A) * P(B)

In fact, it’s what university probability theory saysEvent independence, i.eEvent a and event B do not interfere with each other and are independent of each other

So, according to naive Bayes,P(c|F)The solution process is as follows


在这里插入图片描述


Suppose we have two types of data to classifyC1 and C2

So for dataFWe need to solve two probabilitiesP (C1 | f) and P (C2 | f)

  • IfP(c1|F) > P(c2|F)SoFbelong toc1Class.
  • IfP(c1|F) < P(c2|F)SoFbelong toc2Class.

According to Bayesian principle, we can get the following results

在这里插入图片描述

For the classification problem, our ultimate goal is to classify, not to solve itP(c1|F)andP(c2|F)The exact value of.

According to the above formula, we can see that the denominator part on the right side of the equal sign isP(F)

在这里插入图片描述

So we just need to find outP(F|c1) × P(c1)andP(F|c2) × P(c2)Then you can knowP(c1|F)andP(c2|F)Which one is bigger.

So forP(c|F)It can be further simplified


在这里插入图片描述


4. General steps to deal with classification

Using naive Bayes principle to deal with a classification problem, we generally need to go through the following steps:

  • Preparation stage
    • Get the dataset.
    • Analyze the data, determine the feature attributes, and get the training samples.
  • Training phase
    • Calculate the probability of each categoryP(Ci)
    • For each feature attribute, the conditional probability of each classification is calculatedP(Fj|Ci)
    • CiRepresents all categories.
    • FjIt represents all the features.
  • Forecast stage
    • Given a data, calculate the probability of each classification to which the data belongsP(Fj|Ci) * P(Ci)
    • The probability of the final classification is high, so the data belongs to which classification.

5. Classification with naive Bayes

Next, let’s deal with a practical classification problem. What we’re dealing with isDiscrete data

5.1, data set preparation

Our data set is as follows:

在这里插入图片描述

The feature sets of the dataset areheightweightandshoe sizeThe target set isGender

Our goal is to train a model that can predict gender according to height, weight and shoe size.

We give a characteristic:

  • Height = height, useF1express.
  • Weight = medium, useF2express.
  • Shoe size = medium, useF3express.

It is required that this feature bemalestillfemale? (withC1expressmaleC2expressfemale)That’s the requirementP(C1|F)Big or smallP(C2|F)Big?

#Based on Naive Bayes

   P(C1|F)
=> P(C1|(F1,F2,F3))
=> P(C1|F1) * P(C1|F2) * P(C1|F3)
=> [P(F1|C1) * P(C1)] * [P(F2|C1) * P(C1)] * [P(F3|C1) * P(C1)]

   P(C2|F)
=> P(C2|(F1,F2,F3))
=> P(C2|F1) * P(C2|F2) * P(C2|F3)
=> [P(F1|C2) * P(C2)] * [P(F2|C2) * P(C2)] * [P(F3|C2) * P(C2)]

5.2, calculationP(Ci)

There are two types of target set: male and female. Male appears four times and female appears four times

  • P(C1) = 4 / 8 = 0.5
  • P(C2) = 4 / 8 = 0.5

5.3, calculationP(Fj|Ci)

By observing the data in the table, we can know that:

#In the case of male gender, the probability of height = height
P(F1|C1) = 2 / 4 = 0.5

#In the case of male gender, the probability of weight = medium
P(F2|C1) = 2 / 4 = 0.5

#In the case of male gender, the probability of shoe size = medium
P(F3|C1) = 1 / 4 = 0.25

#In the case of female, the probability of height = height
P(F1|C2) = 0 / 4 = 0

#In the case of female gender, the probability of weight = medium
P(F2|C2) = 2 / 4 = 0.5

#If the gender is female, the probability of shoe size = medium
P(F3|C2) = 2 / 4 = 0.5

5.4, calculationP(Fj|Ci) * P(Ci)

We’ve already deduced thatP(C1|F)andP(C2|F), which can be evaluated as follows:

P(C1|F)
=> [P(F1|C1) * P(C1)] * [P(F2|C1) * P(C1)] * [P(F3|C1) * P(C1)]
=> [0.5 * 0.5] * [0.5 * 0.5] * [0.25 * 0.5]
=> 0.25 * 0.25 * 0.125
=> 0.0078125

   P(C2|F)
=> [P(F1|C2) * P(C2)] * [P(F2|C2) * P(C2)] * [P(F3|C2) * P(C2)]
=> [0 * 0.25] * [0.5 * 0.5] * [0.5 * 0.5]
=> 0

Finally, we can seeP(C1|F) > P(C2|F)So the feature belongs toC1Male.

6. Summary

We can see that for a classification problem:Given a data f, which classification does it belong to?In fact, it is to solve the problemFThe probability of belonging to each category, that isP(C|F)

According to the principle of naive Bayes,P(C|F)AndP(F|C) * P(C)So the final requirement is to solveP(F|C) * P(C). This transforms a classification problem into a probability problem.

The next article will introduce how to use naive Bayes to deal with practical problems.

(end of this section.)


Recommended reading:

Decision tree algorithm theory

Decision tree algorithm – real combat


Welcome to the author’s official account for more dry cargo.

码农充电站pro