Attention Mechanism
Last updated
Last updated
This post covers usage of attention mechanism within a Neural Machine Translation model. High level architecture of a NMT is:
Encoder: Takes input a sequence of text in a particular language, e.g. english and outputs a context vector.
Decoder: Takes input the context vector and outputs a sequence of text in another language, e.g. french.
In pre Transformers era, the encoder and decoder of a NMT architecture used were generally RNNs.
RNN is a bi-directional artificial neural network, meaning, outputs from previous time steps are fed as input to the current time step.
Below is a visual explanation of a vanilla RNN for language modeling task:
The last hidden state vector, output from the encoder RNN is actually context vector. This context vector is passed as input to the decoder RNN.