# How to solve the problem of naive Bayesian classification

Time：2021-3-22

Public number: manong charging station Pro

Bayes principle is a British mathematicianThomas BayesIt was proposed in the 18th century that when we can’t directly calculate the possibility of the occurrence of a thing (a), we can indirectly calculate the possibility of the occurrence of the things (x, y, z) related to this thing, so as to indirectly judge the possibility of the occurrence of thing (a).

Before introducing Bayesian principle, we first introduce several concepts related to probability.

### 1. Probability related concepts

probabilityUsed to describe the probability of an event, using mathematical symbols`P(x)`express,`x`expressrandom variable`P(x)`express`x`And the probability of that.

random variableAccording to whether the variable value is continuous, it can be divided intoDiscrete random variableandContinuous random variable

joint probabilityIt is determined by multiple random variables, and is determined by`P(x, y)`“Event” means “event”`x`And events`y`The probability of simultaneous occurrence.

conditional probabilityIt is also determined by multiple random variables`P(x|y)`“In event” means “in event”`y`On the premise of occurrence, the event`x`The probability of occurrence. “

Marginal probability: from`P(x, y)`Derived`P(x)`To ignore`y`Variable.

• For discrete random variables, the joint probability is used`P(x, y)`stay`y`upperSum up, can be obtained`P(x)`It’s here`P(x)`It’s edge probability.
• For continuous random variables, the joint probability is used`P(x, y)`stay`y`upperIntegral, can be obtained`P(x)`It’s here`P(x)`It’s edge probability.

probability distribution: show all the possible values of the random variable and their corresponding probabilities, and you can get the value of the variableprobability distributionThe probability distribution can be divided into two types: discrete type and continuous type.

commonDiscrete data distribution modelyes:

• Bernoulli distribution: the distribution of a single random variable with only two values, 0 or 1. For example, the probability distribution of coin tossing (regardless of the coin upright) is Bernoulli distribution. The mathematical formula is as follows:
• P(x = 0) = 1 – λ
• P(x = 1) = λ
• Polynomial distribution: also known as classification distribution, describes a single random variable with k different states. Here K is a finite number. If K is 2, it becomes a Bernoulli distribution.
• P(x = k) = λ
• Binomial distribution
• Poisson distribution

commonContinuous data distribution modelyes:

• Normal distribution, also known as Gaussian distribution, is the most important one.
• uniform distribution
• exponential distribution
• Laplace distribution

The mathematical formula of normal distribution is as follows

The distribution of normal distribution is as follows

The normal distribution can also be divided into three types

• Monistic normal distribution: at this timeμIs 0,σIt is 1.
• Multivariate normal distribution.

Mathematical expectationIf we regard “the probability of each random result” asweightThen expectation is the result of allweighted average

varianceThe smaller the variance is, the smaller the deviation is. The larger the variance is, the larger the deviation is.

probability theoryWhat we study is the transformation relationship between these probabilities.

### 2. Bayes theorem

Bayesian formula is as follows:

meaning:

• The right part of the equal sign,`P(Bi)`byPrior probability`P(A|Bi)`byconditional probability
• The whole denominator to the right of the equal sign isMarginal probability
• To the left of the equal sign`P(Bi|A)`byPosterior probabilityFrom prior probability, conditional probability and edge probability.

Bayes theorem can be used in classification problems. When it is used in classification problems, the above formula can be simplified as:

Among them:

• C represents a classification and f represents the attribute value.
• P (C | f) is the probability that the sample belongs to category C when the attribute value f appears in the sample to be classified.
• P (f | C) is the probability of the occurrence of attribute f in classification C, which is obtained from the statistics of training sample data.
• P (c) is the probability of classification C appearing in training data.
• P (f) is the probability of attribute f appearing in training samples.

This means that when we know some attribute eigenvalues, according to this formula, we can calculate the probability of the classification, which classification will be divided into which classification will have the highest probability, which completes a classification problem.

Bayesian inference

Let’s see how the Bayesian formula is derived.

As shown in the figure below, there are two ellipses, C on the left and F on the right.

Now let the two ellipses intersect

It can be seen from the above figure that under the condition of event F, the probability of event C is`P(C ∩ F) / P(F)`That is:

• `P(C | F) = P(C ∩ F) / P(F)`

The results are as follows

• `P(C ∩ F) = P(C | F) * P(F)`

In the same way:

• `P(C ∩ F) = P(F | C) * P(C)`

So:

• `P(C ∩ F) = P(C | F) * P(F) = P(F | C) * P(C)`
• `P(C | F) = P(F | C) * P(C) / P(F)`

### 3. Naive Bayes

Let’s say we have a data set that we want to use Bayesian theorem to classify. There are two characteristics: F1, F2. Now let’s look at the data`F`For classification, we need to solve:

• `P(c|F)`: represents data`F`It belongs to classification`c`And the probability of that.

Because there are characteristics`f1`And`f2`Then:

• `P(c|F) = P(c|(f1,f2))`

For classification problems, there is often more than one feature. If features interact with each other, that is`f1`And`f2`They interact with each other, then`P(c|(f1,f2))`It’s not easy to solve.

Naive Bayes makes a simple and crude assumption on the basis of Bayes, which assumes that multiple features do not affect each other and are independent of each other.

simpleIt meansSimple, simple

It is expressed by mathematical formula

• `P(A, B) = P(A) * P(B)`

In fact, it’s what university probability theory saysEvent independence, i.eEvent a and event B do not interfere with each other and are independent of each other

So, according to naive Bayes,`P(c|F)`The solution process is as follows

Suppose we have two types of data to classify`C1 and C2`

So for data`F`We need to solve two probabilities`P (C1 | f) and P (C2 | f)`

• If`P(c1|F) > P(c2|F)`So`F`belong to`c1`Class.
• If`P(c1|F) < P(c2|F)`So`F`belong to`c2`Class.

According to Bayesian principle, we can get the following results

For the classification problem, our ultimate goal is to classify, not to solve it`P(c1|F)`and`P(c2|F)`The exact value of.

According to the above formula, we can see that the denominator part on the right side of the equal sign is`P(F)`

So we just need to find out`P(F|c1) × P(c1)`and`P(F|c2) × P(c2)`Then you can know`P(c1|F)`and`P(c2|F)`Which one is bigger.

So for`P(c|F)`It can be further simplified

### 4. General steps to deal with classification

Using naive Bayes principle to deal with a classification problem, we generally need to go through the following steps:

• Preparation stage
• Get the dataset.
• Analyze the data, determine the feature attributes, and get the training samples.
• Training phase
• Calculate the probability of each category`P(Ci)`
• For each feature attribute, the conditional probability of each classification is calculated`P(Fj|Ci)`
• `Ci`Represents all categories.
• `Fj`It represents all the features.
• Forecast stage
• Given a data, calculate the probability of each classification to which the data belongs`P(Fj|Ci) * P(Ci)`
• The probability of the final classification is high, so the data belongs to which classification.

### 5. Classification with naive Bayes

Next, let’s deal with a practical classification problem. What we’re dealing with isDiscrete data

#### 5.1, data set preparation

Our data set is as follows:

The feature sets of the dataset are`height``weight`and`shoe size`The target set is`Gender`

Our goal is to train a model that can predict gender according to height, weight and shoe size.

We give a characteristic:

• Height = height, use`F1`express.
• Weight = medium, use`F2`express.
• Shoe size = medium, use`F3`express.

It is required that this feature be`male`still`female`？ (with`C1`express`male``C2`express`female`）That’s the requirement`P(C1|F)`Big or small`P(C2|F)`Big?

``````#Based on Naive Bayes

P(C1|F)
=> P(C1|(F1,F2,F3))
=> P(C1|F1) * P(C1|F2) * P(C1|F3)
=> [P(F1|C1) * P(C1)] * [P(F2|C1) * P(C1)] * [P(F3|C1) * P(C1)]

P(C2|F)
=> P(C2|(F1,F2,F3))
=> P(C2|F1) * P(C2|F2) * P(C2|F3)
=> [P(F1|C2) * P(C2)] * [P(F2|C2) * P(C2)] * [P(F3|C2) * P(C2)]``````

#### 5.2, calculation`P(Ci)`

There are two types of target set: male and female. Male appears four times and female appears four times

• `P(C1) = 4 / 8 = 0.5`
• `P(C2) = 4 / 8 = 0.5`

#### 5.3, calculation`P(Fj|Ci)`

By observing the data in the table, we can know that:

``````#In the case of male gender, the probability of height = height
P(F1|C1) = 2 / 4 = 0.5

#In the case of male gender, the probability of weight = medium
P(F2|C1) = 2 / 4 = 0.5

#In the case of male gender, the probability of shoe size = medium
P(F3|C1) = 1 / 4 = 0.25

#In the case of female, the probability of height = height
P(F1|C2) = 0 / 4 = 0

#In the case of female gender, the probability of weight = medium
P(F2|C2) = 2 / 4 = 0.5

#If the gender is female, the probability of shoe size = medium
P(F3|C2) = 2 / 4 = 0.5``````

#### 5.4, calculation`P(Fj|Ci) * P(Ci)`

We’ve already deduced that`P(C1|F)`and`P(C2|F)`, which can be evaluated as follows:

``````P(C1|F)
=> [P(F1|C1) * P(C1)] * [P(F2|C1) * P(C1)] * [P(F3|C1) * P(C1)]
=> [0.5 * 0.5] * [0.5 * 0.5] * [0.25 * 0.5]
=> 0.25 * 0.25 * 0.125
=> 0.0078125

P(C2|F)
=> [P(F1|C2) * P(C2)] * [P(F2|C2) * P(C2)] * [P(F3|C2) * P(C2)]
=> [0 * 0.25] * [0.5 * 0.5] * [0.5 * 0.5]
=> 0``````

Finally, we can see`P(C1|F) > P(C2|F)`So the feature belongs to`C1`Male.

### 6. Summary

We can see that for a classification problem:Given a data f, which classification does it belong to?In fact, it is to solve the problem`F`The probability of belonging to each category, that is`P(C|F)`

According to the principle of naive Bayes,`P(C|F)`And`P(F|C) * P(C)`So the final requirement is to solve`P(F|C) * P(C)`. This transforms a classification problem into a probability problem.

The next article will introduce how to use naive Bayes to deal with practical problems.

(end of this section.)