🌅
Natural Language Processing
  • Word Embeddings
  • Attention Mechanism
Powered by GitBook
On this page
  • Introduction
  • Under the Hood
  • Recurrent Neural Network (RNN)

Attention Mechanism

PreviousWord Embeddings

Last updated 1 year ago

Introduction

This post covers usage of attention mechanism within a Neural Machine Translation model. High level architecture of a NMT is:

  1. Encoder: Takes input a sequence of text in a particular language, e.g. english and outputs a context vector.

  2. Decoder: Takes input the context vector and outputs a sequence of text in another language, e.g. french.

Under the Hood

In pre Transformers era, the encoder and decoder of a NMT architecture used were generally RNNs.

Recurrent Neural Network (RNN)

RNN is a bi-directional artificial neural network, meaning, outputs from previous time steps are fed as input to the current time step.

Below is a visual explanation of a vanilla RNN for language modeling task:

The last hidden state vector, output from the encoder RNN is actually context vector. This context vector is passed as input to the decoder RNN.

Page cover image