Sequence Models

Last updated on November 6, 2024

This article will take 1 minute to read.

Traditional neural networks have a very obvious problem: they can’t easily process sequence data. Worse still, they can’t deal easily - or at all without re-training - with variably sized inputs.

The analogy to code is simple: sequential code vs looping code, which can deal with repetition.

The solution is equally simple, although not obvious: use the same model several times in a row, to process each individual time-step in the sequence. Although this approach works, it has a few problems:

Long-range text dependencies are difficult.
Vanishing/Exploding Gradients.
- Remember, even if your RNN unit only has 1 layer, if your input has 100 time-steps, that’s equivalent to a 100 layer-deep network. It’s difficult to propagate the Loss so far.
It’s hard to use information after the current time-step
- For example, in the name “Teddy Roosevelt”, it’s tough to know if “Teddy” is a name or a type of bear, without seeing what comes after.

In order, various solutions have been devised:

For problems 1 & 2, there are architectures such as [[ GRU ]] or LSTM. One of the best solutions (from what I know) is the Attention Architecture
For problem 3, Bi-directional RNNs exist. The attention architecture also deals with this.

Notes mentioning this note

Coursera

Coursera is a learning platform that hosts Massive Online Open Courses (MOOCs).

Deep Learning

topic: [[ai]], [[ai safety]]

Lstm

tags : [sequence models] [artificial intelligence] [Deep Learning]

Here are all the notes in this garden, along with their links, visualized as a graph.