Table of Contents [show]
Introduction To LSTM
LSTM belongs to the complicated areas of machine learning
and Deep Learning. Getting
your bearings in LSTM is not an easy task. It deals with algorithms that mimic
the functioning of the human brain and reveal essential relationships in given
sequential data.
In this blog, we will
attempt to understand what LSTM is.
What is LSTM?
First, you must ask
yourself, 'What does LSTM mean?' LSTM
stands for extended short-term memory networks used in machine learning or
in Deep Learning. Various recurrent neural networks (RNNs) are capable of
learning long-term dependencies, especially in sequence prediction problems.
LSTM is an advanced
RNN, a sequential network that enables information retention. It can handle the
vanishing gradient issue encountered by RNN. A recurrent neural network called
an RNN is used for persistent memory.
Let's say you
remember a previous scene while watching a video or know what happened in an
earlier chapter while reading a book. RNNs work similarly; they remember the
above information and use it to process the current input. A shortcoming of
RNNs is that they cannot recognize long-term dependencies due to the vanishing
gradient. LSTMs are explicitly developed to dodge long-term dependency
problems. LSTM has a feedback connection, i.e., it can process the entire data
sequence and individual data points such as images. This locates application in
speech recognition, machine translation, etc.
LSTM Architecture
At a high level, an
LSTM operates similarly to an RNN cell. Here are the inner workings of an LSTM
network. LSTM consists of three parts, each part performs an individual
function.
The first part
chooses whether the information from the last timestamp should be recognized or
is irrelevant and can be forgotten. In the second part, the cell attempts to
learn new information from the input to that cell. Ultimately, in the third
part, the cell handed over the updated information from the recent timestamp to
the following.
These three elements
of an LSTM cell are called gates. The first part is called the Forget gate, the
second is known as the input gate, and the last is the output gate.
1. Forget Gate: Information that is no longer useful in the state of the
cell is removed using the Forget Gate. The two inputs, x_t (input at a particular time) and h_t-1 (output of the last
cell) are fed to the gate and multiplied by the weight matrices, followed by
the deviation. The resulting value is passed through an activation function
that provides a binary output. If the result is 0 for a particular cell state,
the information is forgotten, and for an output of 1, the data is retained for
future use.
2. Input gate: The input gate adds valuable information to the cell
state. First, the information is regularized using a sigmoid function and
filters the values to be remembered, similar to forgetting, using the inputs
h_t-1 and x_t. A vector is then created using the tanh function, which outputs from -1 to +1, containing all possible
values from h_t-1 and x_t. Finally, the significance of the vector and
regularized values are multiplied to obtain helpful information.
3. Output gate: The task of extracting
useful information from the current state of the cell to be presented as output
is performed by the output gate. First, a vector is generated by applying the
tanh function to a cell. Then the information is regularized using a sigmoid
function and filtered according to the values to be remembered using the
inputs h_t-1 and x_t. Finally, the vector and regulated values are multiplied
to be sent as output and input to the next cell.lex areas of Deep Learning. It
is not an effortless task to get your head around LSTM. It deals with
algorithms that mimic how the human brain operates and uncover the underlying
relationships in the given sequential data.
Like a simple RNN,
LSTM also has a hidden state, where H(t-1) denotes the hidden state of the last
timestamp and Ht is the hidden state of the existing timestamp. In addition,
the LSTM also has the cell state represented by C(t-1) and C(t) for the
previous and current timestamps.
Here, the hidden
state is called short-term memory, and the state of the cell is called
long-term memory.
What are bidirectional LSTMs?
They are like an
upgrade over LSTM. In bidirectional LSTMs, each training sequence is presented
forward and backward into separate recurrent networks. Both rows are connected
to the same output layer. Bidirectional LSTMs have information about every point
in a given series, everything before and after.
But how can you rely
on information that didn't happen? The human brain uses its senses to extract
data from words, sounds, or whole sentences that might not make sense at first
but mean something in a future context. Conventional recurrent neural networks
can only use the previous context to obtain information. In bidirectional
LSTMs, information is received by processing data in both directions in two
hidden layers, pushed toward the same output layer. This enables bidirectional
LSTMs to pass long-range contexts in both directions.
LSTM vs. RNN
Consider that you are
tasked with editing certain information in a calendar. To do this, the RNN
completely transforms the existing data by applying a function. While LSTM
performs minor adjustments to the data by simple addition or multiplication
that flow through the cell states. In this way, LSTM forgets and remembers
things selectively, making it an improvement over RNN.
Now consider that you
want to process data with regular patterns, such as predicting the sales of
colored powder that peaks during Holi in India. A good strategy is to look back
at the previous year's sales records. So you need to know which data to forget
and which to save for later use. Otherwise, you must have an excellent memory.
Recurrent neural networks seem to do a good job of this in theory. However,
they have two drawbacks, exploding gradient, and vanishing gradient, which make
them redundant.
Here, LSTM presents
memory units named cell states to solve this problem. The designed cells can be
viewed as a differentiable memory.
Applications of LSTM
" LSTM networks find practical applications in
the following areas:
" Language modeling
" Machine translation
" Text
classification
" Image caption
" Image generation using attention models
" The answer to the question
" Convert video to text
" Polymorphic music modeling
" Speech synthesis
" Prediction of protein secondary structure
This list gives an
idea of the areas in which LSTM is used, but not how it is used. Let's
understand the types of sequence learning problems that LSTM networks are
capable of solving.
LSTM neural networks
can solve several tasks not solvable by previous learning algorithms such as
RNNs. Long-term temporal dependencies can be effectively captured by LSTM
without suffering from many optimization bottlenecks. This is used to solve
higher-class problems.
Summary!
LSTMs are an improvement over RNNs because they can achieve anything that RNNs
could achieve with much better accuracy. As intimidating as it may be, LSTMs
provide better results and are a big step forward in deep learning.