Arabic to English Machine Translation with Google Transformer Model

This is an implementation of Machine Translation from Arabic to English using the Transformer Model.

This repo is based on the code provided by the authors.

We train the model using the OpenSubtitles V2018 arabic-english parallel dataset.

Preprocessing

Before training, we strip all Tashkeel from the arabic sentences in the OpenSubtitles dataset using pyarabic

Training

For now, we only trained a slightly modified version of the tiny model with the following hyperparameters:

num_hidden_layers=6,
hidden_size=64,
num_heads=4,
filter_size=256,

layer_postprocess_dropout=0.1,
attention_dropout=0.1,
relu_dropout=0.1,

optimizer_adam_beta1=0.9,
optimizer_adam_beta2=0.997,
optimizer_adam_epsilon=1e-09

Time

Model was trained for ~20h for 10 epochs (2 hours/epoch)

Evaluation results

We sampled a 1000-sentence portion from the OpenSubtitles v2018 training set for evaluation. Below are the case-insensitive BLEU scores after 10 epochs.

Param Set	Score
tiny	26.54

Sample Translations

arabic (in)	english (out)
دائما لشخص واحد	Always for one person.
وهذا لن يشكل فارق، فأنا أقود سيارتي بهذا الطريق اسبوعيا	And that won't be a difference, I'm driving my car this way a week.
أنا لا أبحث عن الرجل المناسب	I'm not looking for the right guy.
اعتقد أنني بدأت أعجب بها وهي أيضا تبادلني نفس الشعور	I think I'm starting to like her, and she also makes me the same feeling.
ماذا لو ان هذه هي آخر فرصة لي للتحدث؟	What if this is the last chance to talk to me?

Train your own Model

Below are the commands for running the Transformer model. See the Detailed instrutions for more details on running the model.

cd /path/to/models/official/transformer

# Ensure that PYTHONPATH is correctly defined as described in
# https://github.com/tensorflow/models/tree/master/official#requirements
# export PYTHONPATH="$PYTHONPATH:/path/to/models"

# Export variables
PARAM_SET=big
DATA_DIR=$HOME/transformer/data
MODEL_DIR=$HOME/transformer/model_$PARAM_SET
VOCAB_FILE=$DATA_DIR/vocab.ende.32768

# Download training/evaluation datasets
python data_download.py --data_dir=$DATA_DIR

# Train the model for 10 epochs, and evaluate after every epoch.
python transformer_main.py --data_dir=$DATA_DIR --model_dir=$MODEL_DIR \
    --vocab_file=$VOCAB_FILE --param_set=$PARAM_SET \
    --bleu_source=test_data/dev.ar --bleu_ref=test_data/dev.en

# Run during training in a separate process to get continuous updates,
# or after training is complete.
tensorboard --logdir=$MODEL_DIR

# Translate some text using the trained model
python translate.py --model_dir=$MODEL_DIR --vocab_file=$VOCAB_FILE \
    --param_set=$PARAM_SET --text="hello world"

# Compute model's BLEU score using the newstest2014 dataset.
python translate.py --model_dir=$MODEL_DIR --vocab_file=$VOCAB_FILE \
    --param_set=$PARAM_SET --file=test_data/newstest2014.en --file_out=translation.en
python compute_bleu.py --translation=translation.en --reference=test_data/dev.en

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
img		img
model		model
tiny-model		tiny-model
utils		utils
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
__init__.py		__init__.py
app.py		app.py
compute_bleu.py		compute_bleu.py
compute_bleu_test.py		compute_bleu_test.py
data_download.py		data_download.py
fast_predict.py		fast_predict.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt
transformer_main.py		transformer_main.py
translate.py		translate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic to English Machine Translation with Google Transformer Model

Preprocessing

Training

Time

Evaluation results

Sample Translations

Train your own Model

About

Releases

Packages

Contributors 3

Languages

omar-mohamed/Transformer-Arabic-To-English

Folders and files

Latest commit

History

Repository files navigation

Arabic to English Machine Translation with Google Transformer Model

Preprocessing

Training

Time

Evaluation results

Sample Translations

Train your own Model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages