What is LSTM? Introduction to Long Short Term Memory

Table of Contents [show]

Introduction To LSTM

LSTM belongs to the complicated areas of machine learning and Deep Learning. Getting your bearings in LSTM is not an easy task. It deals with algorithms that mimic the functioning of the human brain and reveal essential relationships in given sequential data.

In this blog, we will attempt to understand what LSTM is.

What is LSTM?

First, you must ask yourself, 'What does LSTM mean?' LSTM stands for extended short-term memory networks used in machine learning or in Deep Learning. Various recurrent neural networks (RNNs) are capable of learning long-term dependencies, especially in sequence prediction problems.

LSTM is an advanced RNN, a sequential network that enables information retention. It can handle the vanishing gradient issue encountered by RNN. A recurrent neural network called an RNN is used for persistent memory.

Let's say you remember a previous scene while watching a video or know what happened in an earlier chapter while reading a book. RNNs work similarly; they remember the above information and use it to process the current input. A shortcoming of RNNs is that they cannot recognize long-term dependencies due to the vanishing gradient. LSTMs are explicitly developed to dodge long-term dependency problems. LSTM has a feedback connection, i.e., it can process the entire data sequence and individual data points such as images. This locates application in speech recognition, machine translation, etc.

LSTM Architecture

At a high level, an LSTM operates similarly to an RNN cell. Here are the inner workings of an LSTM network. LSTM consists of three parts, each part performs an individual function.

The first part chooses whether the information from the last timestamp should be recognized or is irrelevant and can be forgotten. In the second part, the cell attempts to learn new information from the input to that cell. Ultimately, in the third part, the cell handed over the updated information from the recent timestamp to the following.

These three elements of an LSTM cell are called gates. The first part is called the Forget gate, the second is known as the input gate, and the last is the output gate.

1. Forget Gate: Information that is no longer useful in the state of the cell is removed using the Forget Gate. The two inputs, x_t (input at a particular time) and h_t-1 (output of the last cell) are fed to the gate and multiplied by the weight matrices, followed by the deviation. The resulting value is passed through an activation function that provides a binary output. If the result is 0 for a particular cell state, the information is forgotten, and for an output of 1, the data is retained for future use.

2. Input gate: The input gate adds valuable information to the cell state. First, the information is regularized using a sigmoid function and filters the values to be remembered, similar to forgetting, using the inputs h_t-1 and x_t. A vector is then created using the tanh function, which outputs from -1 to +1, containing all possible values from h_t-1 and x_t. Finally, the significance of the vector and regularized values are multiplied to obtain helpful information.

3. Output gate: The task of extracting useful information from the current state of the cell to be presented as output is performed by the output gate. First, a vector is generated by applying the tanh function to a cell. Then the information is regularized using a sigmoid function and filtered according to the values to be remembered using the inputs h_t-1 and x_t. Finally, the vector and regulated values are multiplied to be sent as output and input to the next cell.lex areas of Deep Learning. It is not an effortless task to get your head around LSTM. It deals with algorithms that mimic how the human brain operates and uncover the underlying relationships in the given sequential data.

Like a simple RNN, LSTM also has a hidden state, where H(t-1) denotes the hidden state of the last timestamp and Ht is the hidden state of the existing timestamp. In addition, the LSTM also has the cell state represented by C(t-1) and C(t) for the previous and current timestamps.

Here, the hidden state is called short-term memory, and the state of the cell is called long-term memory.

Our Learners Also Read- Clustering Algorithms in Machine Learning

What are bidirectional LSTMs?

They are like an upgrade over LSTM. In bidirectional LSTMs, each training sequence is presented forward and backward into separate recurrent networks. Both rows are connected to the same output layer. Bidirectional LSTMs have information about every point in a given series, everything before and after.

But how can you rely on information that didn't happen? The human brain uses its senses to extract data from words, sounds, or whole sentences that might not make sense at first but mean something in a future context. Conventional recurrent neural networks can only use the previous context to obtain information. In bidirectional LSTMs, information is received by processing data in both directions in two hidden layers, pushed toward the same output layer. This enables bidirectional LSTMs to pass long-range contexts in both directions.

LSTM vs. RNN

Consider that you are tasked with editing certain information in a calendar. To do this, the RNN completely transforms the existing data by applying a function. While LSTM performs minor adjustments to the data by simple addition or multiplication that flow through the cell states. In this way, LSTM forgets and remembers things selectively, making it an improvement over RNN.

Now consider that you want to process data with regular patterns, such as predicting the sales of colored powder that peaks during Holi in India. A good strategy is to look back at the previous year's sales records. So you need to know which data to forget and which to save for later use. Otherwise, you must have an excellent memory. Recurrent neural networks seem to do a good job of this in theory. However, they have two drawbacks, exploding gradient, and vanishing gradient, which make them redundant.

Here, LSTM presents memory units named cell states to solve this problem. The designed cells can be viewed as a differentiable memory.

Applications of LSTM

" LSTM networks find practical applications in the following areas:

" Language modeling

" Machine translation

" Text classification

" Image caption

" Image generation using attention models

" The answer to the question

" Convert video to text

" Polymorphic music modeling

" Speech synthesis

" Prediction of protein secondary structure

This list gives an idea of the areas in which LSTM is used, but not how it is used. Let's understand the types of sequence learning problems that LSTM networks are capable of solving.

LSTM neural networks can solve several tasks not solvable by previous learning algorithms such as RNNs. Long-term temporal dependencies can be effectively captured by LSTM without suffering from many optimization bottlenecks. This is used to solve higher-class problems.

Summary!

LSTMs are an improvement over RNNs because they can achieve anything that RNNs could achieve with much better accuracy. As intimidating as it may be, LSTMs provide better results and are a big step forward in deep learning.

E&ICT Academy, IIT Roorkee Programs