Skip to content

Assorted Thai Texts used for WangchanBERTa pre-training

Latest
Compare
Choose a tag to compare
@lalital lalital released this 18 Jan 10:41
· 4 commits to master since this release
8b42347

This release contains cleaned datasets we used in transformer-based Thai language model pre-training (WangchanBERTa; wangchanberta-base-att-spm-uncased).

The cleaned datasets is only partially available since data from Wisesight, Pantip, and TNC is not under explicit open source licenses.