Multi class neural networks for machine learning: softmax


We already know,logistic regression Decimals between 0 and 1.0 can be generated. For example, the logistic regression output value of an e-mail classifier is 0.8, indicating that the probability of e-mail being spam is 80%, and the probability of not being spam is 20%. Obviously, the sum of the probability that an e-mail is spam or non spam is 1.0.

Softmax extends this idea to multiple domains. In other words, in multi category problems, softmax will assign a probability expressed in decimals to each category. The sum of these decimal probabilities must be 1.0. Compared with other methods, this additional limitation helps to make the training process converge more quickly.

For example, returning to the image analysis example we saw in Figure 1, softmax may get the following probability that the image belongs to a specific category on a certain day;

Multi class neural networks for machine learning: softmax

The softmax layer is the neural network layer immediately before the output layer.

The softmax layer must have the same number of nodes as the output layer.

Multi class neural networks for machine learning: softmax

Figure 2. Softmax layer in neural network

Softmax options

Check out the softmax variants:

  • Full softmaxSoftmax, which we have been discussing; That is, softmax calculates the probability for each possible category.
  • Candidate samplingIt means that softmax calculates the probability for all positive category labels, but only for the random samples of negative category labels. For example, if we want to determine whether an input image is a beagle or a bloodhound image, we don’t have to provide probability for each non dog sample.

    When the number of categories is small, the cost of complete softmax is small, but as the number of categories increases, it will become extremely expensive. Candidate sampling can improve the efficiency of dealing with problems with a large number of categories.

    One label and multiple labels

    Softmax assumes that each sample is only a member of a category. However, some samples can be members of multiple categories at the same time. For such examples:

  • You cannot use softmax
  • You must rely on multiple logistic regression.

    For example, suppose your sample contains only a picture of one item (a piece of fruit). Softmax can determine the probability that the content is pear, orange, apple, etc. If your sample is a picture containing various contents (several bowls of different kinds of fruit), you must use multiple logistic regression instead.

This work adoptsCC agreement, reprint must indicate the author and the link to this article