# Attention Mechanism

## Introduction

This post covers usage of attention mechanism within a Neural Machine Translation model. High level architecture of a NMT is:&#x20;

1. Encoder: Takes input a sequence of text in a particular language, e.g. english and outputs a context vector.
2. Decoder: Takes input the context vector and outputs a sequence of text in another language, e.g. french.

## Under the Hood

In pre Transformers era, the encoder and decoder of a NMT architecture used were generally RNNs.

### Recurrent Neural Network (RNN)

RNN is a bi-directional artificial neural network, meaning, outputs from previous time steps are fed as input to the current time step.

Below is a visual explanation of a vanilla RNN for language modeling task:

<figure><img src="https://2465539769-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FIdHkk3JBLUnhtRtVzg09%2Fuploads%2Fn5N1FzrCHKzEo0MWzkva%2Fimage.png?alt=media&#x26;token=3c2a5743-6800-4de2-adc3-7b59ad3f60b6" alt=""><figcaption></figcaption></figure>

The last hidden state vector, output from the encoder RNN is actually context vector. This context vector is passed as input to the decoder RNN.
