Sequence Models

This article will take 1 minute to read.

tags : coursera Deep Learning

Traditional neural networks have a very obvious problem: they can’t easily process sequence data. Worse still, they can’t deal easily - or at all without re-training - with variably sized inputs.

The analogy to code is simple: sequential code vs looping code, which can deal with repetition.

The solution is equally simple, although not obvious: use the same model several times in a row, to process each individual time-step in the sequence. Although this approach works, it has a few problems:

  1. Long-range text dependencies are difficult.
  2. Vanishing/Exploding Gradients.
    • Remember, even if your RNN unit only has 1 layer, if your input has 100 time-steps, that’s equivalent to a 100 layer-deep network. It’s difficult to propagate the Loss so far.
  3. It’s hard to use information after the current time-step
    • For example, in the name “Teddy Roosevelt”, it’s tough to know if “Teddy” is a name or a type of bear, without seeing what comes after.

In order, various solutions have been devised:

  • For problems 1 & 2, there are architectures such as [[ GRU ]] or LSTM. One of the best solutions (from what I know) is the Attention Architecture
  • For problem 3, Bi-directional RNNs exist. The attention architecture also deals with this.

Notes mentioning this note


Here are all the notes in this garden, along with their links, visualized as a graph.