Skip to content

Latest commit

 

History

History
40 lines (34 loc) · 2.26 KB

README.md

File metadata and controls

40 lines (34 loc) · 2.26 KB

IndoNLU

IndoNLU is a collection of Natural Language Understanding (NLU) resources for Bahasa Indonesia.

12 Downstream Tasks

  • Link [Link]
  • We provide train, valid, and test set (with masked labels, no true labels). We are currently preparing a platform for auto-evaluation using Codalab. Please stay tuned!

Indo4B

  • 23GB Indo4B Pretraining Dataset [Link]

IndoBERT models

Leaderboard (Under Construction)

  • Community Portal and Public Leaderboard [Link]

Paper

IndoNLU has been accepted on AACL 2020 and you can find the detail on https://arxiv.org/abs/2009.05387 If you are using any component on IndoNLU for research purposes, please cite the following paper:

@inproceedings{wilie2020indonlu,
  title={IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding},
  author={Bryan Wilie and Karissa Vincentio and Genta Indra Winata and Samuel Cahyawijaya and X. Li and Zhi Yuan Lim and S. Soleman and R. Mahendra and Pascale Fung and Syafri Bahar and A. Purwarianti},
  booktitle={Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing},
  year={2020}
}