A recurrent neural network (RNN) is a type of artificial neural network that uses sequential data, or time series data, to predict outcomes. Typical inputs for an RNN system include language translation, speech recognition, and image captioning. Recurrent neural networks are characterized by the ability to retain previous datasets, which can influence the outcomes delivered.
Recurrent Neural Networks have various network structures. Common structures include one to one, one to many, many to one, and many to many.
In a one to one structure, each input receives one output. An input into a search engine asking the time in London will receive the output of the time in London. This is the basic use of one to one network.
This type of neural network has a single input and multiple outputs. Image caption is commonly associated with a one to many RNN. Image caption is words or phrases used to describe an image. When looking for an image online, a phrase is searched (image caption) and many picture results populate according to that phrase.
This recurrent neural network structure takes a sequence of inputs to generate a single output. This network can be used for sentiment analysis, where a given sentence can be classified to express positive or negative views
The many to many recurrent neural network structure takes a sequence of inputs and generates a sequence of outputs. Machine translation, for example, takes phrases from one language and translates them into others.
To train a recurrent neural network, backpropagation algorithms are modified to include unfolding. It is also known as back propagation in time or a BPTT algorithm.
Backpropagation algorithms are based on computing vector gradients. The model trains itself by calculating errors from output input layers These calculations enable adjustments to better fit model parameters. The algorithm has different layers, each associated with a weight. Weight is the limit of each neural network.
The following list contains advantages of recurrent neural network:
RNNs can manipulate sequence data.
RNNs can process inputs of varying length
RNNs have memory, or rather, can retain historical data.
The following list contains disadvantages of recurrent neural network:
Computations can be slow.
RNN’s can not account for future inputs to make decisions.
The gradients used to compute the weight updates may get very close to zero, preventing the network from learning new weights. This can be referred to as the vanishing gradient problem.
When training the network, the data can exponentially grow rather than decay, which is referred to as the exploding gradient problem.
Speech recognition is the process of taking sound waves from someone speaking and translating it into text.
Machine translation is translating text or speech from one language to another. As this system is new, it struggles with accurate translation due to the complexity of languages.
Time series prediction is taking a previous dataset of equally spaced points in time and predicting what the future date set will look like.
An RNN system with handwriting recognition interprets handwritten text. This can be used to optimize mail sorting and the expansion of digital libraries.
Recurrent neural networks can be confused with feed-forward neural networks. The difference between these networks is the feedforward network signal travels in one direction, from input to output and there is no feedback loop. Recurrent neural networks do have feedback loops and signals can travel in both directions. Recurrent neural networks memorize past data to influence output whereas feedforward neural networks cannot memorize previous data.
There are different structures of recurrent neural networks throughout machine learning. Different types of structures are bidirectional recurrent neural networks, gated recurrent units, and long short term memory.
In bidirectional recurrent neural networks, inputs from future time steps are used to improve the network accuracy. This network uses the first and last words of a sentence to predict words in the middle.
These types of networks are designed to solve the vanishing gradient problem. To handle the vanishing gradient problem, a reset and update gate are in the network. These gates determine which information will be retained for future predictions.
Long short term memory was designed to address the vanishing gradient problem as well. LSTM uses three gates: input, output and forget gates. Similar to GRU, these gates help determine information retained for future predictions.