This repository includes code to reproduce our experiments on Thai-English NMT models and scripts to download the datasets (scb-mt-en-th-2020
, mt-opus
and scb-mt-en-th-2020+mt-opus
) along with the train/validation/test split that we used in the experiments.
Our experiments are listed below.
-
Experiment #1 TBASE.SCB-1M -- Transformer BASE models trained on scb-mt-en-th-2020 v1.0
-
Experiment #2 TBASE.MT-OPUS -- Transformer BASE models trained on English-Thai datasets listed in Open Parallel Corpus (OPUS)
-
Experiment #3 TBASE.SCB-1M+MT-OPUS -- Transformer BASE models trained on English-Thai scb-mt-en-th-2020 v1.0 and datasets listed in Open Parallel Corpus (OPUS)
BibTeX entry and citation info
@Article{Lowphansirikul2021,
author={Lowphansirikul, Lalita
and Polpanumas, Charin
and Rutherford, Attapol T.
and Nutanong, Sarana},
title={A large English--Thai parallel corpus from the web and machine-generated text},
journal={Language Resources and Evaluation},
year={2021},
month={Mar},
day={30},
issn={1574-0218},
doi={10.1007/s10579-021-09536-6},
url={https://doi.org/10.1007/s10579-021-09536-6}