Code run: colab
import matplotlib.pyplot as plt import numpy as np import math x = np.linspace(-10, 10, 100) z = 1 / (1 + np.exp(-x)) plt.title("Sigmoid") plt.plot(x, z) plt.xlabel("x") plt.ylabel("Sigmoid(X)") plt.savefig("sigmoid.png") plt.show() plt.close()
Nature and problems
The value range of function value s (x) is (0, 1), which is often used in binary classification problems. The function is smooth and easy to derive. However, as an activation function, it has a large amount of calculation. When calculating the error gradient by back propagation, there is division in the derivation, which is prone to the disappearance of the gradient. When the input is close to positive infinity or negative infinity, the gradient approaches 0 and gradient dispersion occurs (with the increase of the number of network layers, when using the back propagation algorithm to calculate the gradient, the gradient disappears very obviously from the output layer to the first few layers, resulting in the derivative of the overall loss function to the weight of the first few layers is very small. Therefore, when using the gradient descent algorithm, the weight of the first few layers changes very slowly, and even useful features cannot be learned.) . because the sigmoid function value is greater than 0, the weight update can only be updated in one direction, which may affect the convergence speed.
Sigmoid function is a very common activation function in neural networks. It is widely used in logistic regression and in the fields of statistics and machine learning.
- Original from: rais