Recurrent neural network (RNN)
To understand ltsm, we must first understand RNN.
Similar to human thinking, people never think from the beginning, but from the existing knowledge to do in-depth thinking.
But the traditional neural network can not complete this, so the recurrent neural network can solve this problem well.
Recurrent neural network (RNN)It’s a kind of circular network, it can be information persistence.
RNN has been widely used in many fields, such as speech recognition, language modeling, translation, image subtitle and so on. The key to these achievements is ltsm network model, which makes many tasks much better than the standard model.
Long term dependence
One of the characteristics of RNN is that it can use the previous information. But sometimes, the gap between the information we need and the relevant information is very small, so we don’t need the help of context. On the contrary, sometimes there is a big gap between the required information and the relevant information, which requires a lot of context assistance.
In theory, RNN can do this
Long term dependenceBut in fact, RNN can’t learn from them at all. Ltsm has no problem at all.
Long term and short term memory network (ltsm)
LTSMIt is a special RNN model, which can learn the long-term dependence problem.
All recurrent neural networks have the form of repeated module chains of neural networks. In the standard RNN, this repeating module will have a very simple structure, such as a single tanh layer.
LSTM also has this chain structure, but repeat modules have different structures. Instead of just one neural network layer, there are four that interact in a very special way.
The key of lstms is the cell state, which is the horizontal line through the top of the graph. Unit status is a bit like a conveyor belt. It goes straight along the chain, with only a few tiny linear interactions. It’s easy for information to flow like this.
LSTM does have the ability to remove or add information to the state of the cell, which is called
doorThe structure of the machine is carefully adjusted.
doorIt’s a selective way to get information through. They are composed of a sigmoid neural network layer and a point by point multiplication operation.
1. Determine the information to be discarded from the cell state
2. Determine which new information to store in the cell state
First, an S-shaped layer called the input gate layer determines which values we will update. Next, the tanh layer creates a vector of new candidate values. You can add it to the state. In the next step, we combine the two to create a status update.
3. It’s time to update the old cell status
4. Decide what to output
This output will be based on our unit status, but will be a filtered version.
For the language model example, because it only sees a subject, it may output verb related information just in case. For example, it may output whether the subject is singular or plural, so that we know in what form the verb should be conjugated if it is followed by a verb.