Comprehensive documentation for this lab here: https://tinyurl.com/yalmjyk2
source ENV.sh
This will allow your run to find the necessary executables. You do not need to modify the values in this file if you are on the PSC grid. If not, replace them with values for your grid.
- Run data.sh. This will fetch data from the internet and create train, test and validation datasets.
sh data.sh
- This will also create a smaller toy training dataset that we will use during this session to make training faster.
Pre-process the input files; tokenize etc. The following command will do this for you.
sh preprocess.sh
sh train-toy.sh de en
sh train-toy.sh en de
18.04 BLEU.
14.52 BLEU