Sequence Models
This article will take 1 minute to read.
tags : coursera Deep Learning
Traditional neural networks have a very obvious problem: they can’t easily process sequence data. Worse still, they can’t deal easily - or at all without re-training - with variably sized inputs.
The analogy to code is simple: sequential code vs looping code, which can deal with repetition.
The solution is equally simple, although not obvious: use the same model several times in a row, to process each individual time-step in the sequence. Although this approach works, it has a few problems:
- Long-range text dependencies are difficult.
- Vanishing/Exploding Gradients.
    - Remember, even if your RNN unit only has 1 layer, if your input has 100 time-steps, that’s equivalent to a 100 layer-deep network. It’s difficult to propagate the Loss so far.
 
- It’s hard to use information after the current time-step
    - For example, in the name “Teddy Roosevelt”, it’s tough to know if “Teddy” is a name or a type of bear, without seeing what comes after.
 
In order, various solutions have been devised:
- For problems 1 & 2, there are architectures such as [[ GRU ]] or LSTM. One of the best solutions (from what I know) is the Attention Architecture
- For problem 3, Bi-directional RNNs exist. The attention architecture also deals with this.