Attention Mechanism

Introduction

This post covers usage of attention mechanism within a Neural Machine Translation model. High level architecture of a NMT is:

Encoder: Takes input a sequence of text in a particular language, e.g. english and outputs a context vector.
Decoder: Takes input the context vector and outputs a sequence of text in another language, e.g. french.

In pre Transformers era, the encoder and decoder of a NMT architecture used were generally RNNs.

RNN is a bi-directional artificial neural network, meaning, outputs from previous time steps are fed as input to the current time step.

Below is a visual explanation of a vanilla RNN for language modeling task:

The last hidden state vector, output from the encoder RNN is actually context vector. This context vector is passed as input to the decoder RNN.

Last updated 1 year ago