Ltsm network


Recurrent neural network (RNN)

To understand ltsm, we must first understand RNN.

Similar to human thinking, people never think from the beginning, but from the existing knowledge to do in-depth thinking.

But the traditional neural network can not complete this, so the recurrent neural network can solve this problem well.

Recurrent neural network (RNN)It’s a kind of circular network, it can be information persistence.

RNN has been widely used in many fields, such as speech recognition, language modeling, translation, image subtitle and so on. The key to these achievements is ltsm network model, which makes many tasks much better than the standard model.

Long term dependence

One of the characteristics of RNN is that it can use the previous information. But sometimes, the gap between the information we need and the relevant information is very small, so we don’t need the help of context. On the contrary, sometimes there is a big gap between the required information and the relevant information, which requires a lot of context assistance.

In theory, RNN can do thisLong term dependenceBut in fact, RNN can’t learn from them at all. Ltsm has no problem at all.

Long term and short term memory network (ltsm)

LTSMIt is a special RNN model, which can learn the long-term dependence problem.

All recurrent neural networks have the form of repeated module chains of neural networks. In the standard RNN, this repeating module will have a very simple structure, such as a single tanh layer.

Ltsm network

LSTM also has this chain structure, but repeat modules have different structures. Instead of just one neural network layer, there are four that interact in a very special way.

Ltsm network

The key of lstms is the cell state, which is the horizontal line through the top of the graph. Unit status is a bit like a conveyor belt. It goes straight along the chain, with only a few tiny linear interactions. It’s easy for information to flow like this.

Ltsm network

LSTM does have the ability to remove or add information to the state of the cell, which is calleddoorThe structure of the machine is carefully adjusted.doorIt’s a selective way to get information through. They are composed of a sigmoid neural network layer and a point by point multiplication operation.

Ltsm network

Ltsm steps

1. Determine the information to be discarded from the cell state

2. Determine which new information to store in the cell state

First, an S-shaped layer called the input gate layer determines which values we will update. Next, the tanh layer creates a vector of new candidate values. You can add it to the state. In the next step, we combine the two to create a status update.

Ltsm network

3. It’s time to update the old cell status

Ltsm network

4. Decide what to output

This output will be based on our unit status, but will be a filtered version.

Ltsm network

For the language model example, because it only sees a subject, it may output verb related information just in case. For example, it may output whether the subject is singular or plural, so that we know in what form the verb should be conjugated if it is followed by a verb.

Related references: