Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
How Long short term memory works in NLP?
Neural Narrator
Jul 5, 2024
31 views
How Long short term memory works in NLP?
A common issue that RNN face is that if you train network in a very large sequence it will begin to forget the very first inputs.
The later and later training batches of alter inputs, things that come towards at the end of the text document is going to start overriding the weights of the very beginning inputs.
We want to make sure that we're not forgetting those first inputs information as we are going through recurrent neural networks.
We need some sort of long term memory for our networks. All the data including the first initial data it was trained on .
LSTM: Long short term memory cell:
LSTM was created to help address these RNN issues.
Let's go through how an LSTM cell works. This is what is used for text generation.
For a typically recurrent neuron.
input (t-1) -> output (t-1)
and this output is also fed back along with t
output(t-1) + input(t) -> output (t)
these output(t) are often called hidden. Instead of saying output (t-1) we can also say H(t-1).
H(t) => typically output of a recurrent neural network.
LSTM:
This is done step by step.
1step is called forget gates layers.
1. Forget Gates Layers:
Decided what information we are going to forget or throw away from the cell state.
f(t) = sigmoid(Wf*[h(t-1), xt] + bf)
we pass H(t-1) and Xt => perform linear transformation with some weights and bias terms into a sigmoid function. Since it's sigmoid layer it's going to output a number between 0 and 1.
For example a language model: we're trying to predict the very next word based on previous ones, a cell state might include the gender of the present subject. so you can pick up correct pronoun, But as you see a new subject, you might wanna forget about the gender of previous/old subject.
2.What are we going to store in the cell state? C(t)
3.Update State by combining 1 & 2 (forget gate layer and cell state)
Time to update the old cell state i.e C(t-1) to C(t).
In the previous state we have decided what to forget and what to store, here we are just going to execute it.
Some variance of LSTM:
This resulting Gated recurrent unit is actually simpler than standard LSTM models. because of that it's growing increasingly popular.
The main idea is to understand how LSTM works. and that allows you to quickly learn how these variations works. For text generations LSTMs work best.