323-380-2660 [email protected]

Introduction

In Artificial Intelligence new types of networks are constantly generated by scientists. Deep Learning has become a very effective method to detect patterns in different types of data. Today, convolutional neural networks and long short-term memory networks are becoming powerhouses in prediction.  Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of Recurrent Neural Network (RNN), capable of learning long-term dependencies. The idea behind RNNs is to make use of sequential information. In a traditional neural networks we assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very bad idea. If you want to predict the next word in a sentence you better know which words came before it. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. LSTM networks were introduced by Hochreiter & Schmidhuber (1997). LSTM networks have a chain like structure that helps it contain information for long periods of time. The long short-term memory (LSTM) network trained by gradient descent solves difficult problems which traditional recurrent neural networks in general cannot. Figure 1 below shows how LSTM networks work.

 

Figure 1: LSTM Networks

Use Cases

When you are dealing with sequential data or data with temporal relationship, an LSTM network should be used. LSTM networks are very good at holding long term memories. Or in other words, the prediction of nth sample in a sequence of test samples can be influenced by an input that was given many time steps before. The network can store or release memory on the go through the gating mechanism. LSTMs help preserve the error that can be backpropagated through time and layers. By maintaining a more constant error, they allow recurrent nets to continue to learn over many time steps. LSTMs contain information outside the normal flow of the recurrent network in a gated cell. Information can be stored in, written to, or read from a cell, much like data in a computer’s memory. The cell makes decisions about what to store, and when to allow reads, writes and erasures, via gates that open and close. LSTM networks are used in things like stock prediction and natural language processing. TensorFlow is an open source machine learning library that allows us to experiment with many neural networks. TensorFlow is developed by the Google Brain team and released in November 2015. It relies on the construction of dataflow graphs with nodes that represent mathematical operations (ops) and edges that represent tensors (multidimensional arrays represented internally as numpy ndarrays). The dataflow graph is a summary of all the computations that are executed asynchronously and in parallel within a TensorFlow session on a given device (CPUs or GPUs). We will be using the TensorFlow python library for today’s examples.

We’ll visualize this with an example. Let’s take the example of predicting stock prices for a particular stock. For this example we are using USDJPY pairs in FOREX. The actual stock prices are colored in blue, but because our AI model is predicting both the training (in green)  and testing (in red) the blue is not as visible. Our LSTM model showed a 96% accuracy when predicting with multiple inputs. Code for this repository can be found here. Figure 2 below shows the amount of days in the X coordinate and the stock prices in the Y coordinate.

Figure 2: FOREX (USDJPY) Prediction

Another example can be seen with today’s ChatBots. Given its properties, LSTM networks really are well suited for various natural language processing tasks. Question-answering is another big NLP research problem, with its own ecosystem of complex and component-heterogeneous pipelines. We are using the seq2seq repository to showcase how LSTM networks are used with ChatBots. The Sequence to Sequence model (seq2seq) consists of two RNNs – an encoder and a decoder. The encoder reads the input sequence, word by word and emits a context (a function of final hidden state of encoder), which would ideally capture the essence (semantic summary) of the input sequence. Based on this context, the decoder generates the output sequence, one word at a time while looking at the context and the previous word during each timestep. Figure 3 below displays how the networks responds to questions.

Figure 3: Seq2Seq Model

Conclusion

A Recurrent Neural Network is a deep learning model dedicated to the handling of sequences. Here an internal state is responsible for taking into consideration and properly handle the dependency that exists between successive inputs. Long short-term memory networks were a big step in what we can accomplish with normal Recurrent Neural Networks. LSTM networks have a way to remove some of the vanishing gradients problems that normal RNNs have. FoundationAI provides Machine Learning Consulting to clients all around the world. If you are interested in implementing any of these networks please do not hesitate to contact us.