Introduction

This is an implementation of the Encoder-Decoder architecture used to solve a transliteration problem: Converting English words to Hindi words. For example, the word 'ghar' would be transliterated to 'घर'.

There are two different implementations to solve the same problem:

one without using attention and instead using a RNN based architecture
one using attention for the same RNN based architecture

Running the code

The barebones code is in train.py. Inside, you will find code comments on the architecture and the hyperparameters used. The code can be run using the following command:

python train.py

The appropriate arguments can be added using the --help argument.

The actual results of running the code are in a3.ipynb. This is where you would find the sweeps.

Results

The predictions of both the implementations are in the predictions_vanilla and the predictions_attention folder. There, you would find the predictions with and without attention.

Vanilla Seq2seq

While these results are found for 5 epochs, we shall be checking them against 10 epochs for the test set.

The training and the validation accuracy for the best model after 10 epochs were 37.0% and 30.7% respectively. The test validation accuracy was 27.8%.

Seq2seq with attention

The best model was taken and tested on the test set. The model was trained for 10 epochs in the case. The validation accuracy was 36.1% and the training accuracy was 39.5%. This is better than the vanilla model but not by a lot.

We can show some heatmaps for the attention matrices computed by the model for some sample words.

Comparing the vanilla seq2seq with the seq2seq with attention

The attention model is better than the vanilla model by their validation accuracies. We can compare the models also by looking at their correct and incorrect words. The results are shown below:

The attention model does get some words right that the vanilla model does not. However, the vanilla model also gets some words right that the attention model does not (around 8 percent).

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
CS6910 Assignment 3		CS6910 Assignment 3
lightning_logs		lightning_logs
predictions_attention		predictions_attention
predictions_vanilla		predictions_vanilla
wandb		wandb
.gitignore		.gitignore
README.md		README.md
a3.ipynb		a3.ipynb
cs6910-assignment-3.ipynb		cs6910-assignment-3.ipynb
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Running the code

Results

Vanilla Seq2seq

Seq2seq with attention

Comparing the vanilla seq2seq with the seq2seq with attention

About

Releases

Packages

Languages

sooraj-srini/CS6910-assignment3

Folders and files

Latest commit

History

Repository files navigation

Introduction

Running the code

Results

Vanilla Seq2seq

Seq2seq with attention

Comparing the vanilla seq2seq with the seq2seq with attention

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages