-
-
Notifications
You must be signed in to change notification settings - Fork 46
/
Copy pathguWordlistReadme.txt
16 lines (14 loc) · 1.27 KB
/
guWordlistReadme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Gujarati word list 1 from Stardict, adapted by Docbroke
Source: https://github.com/sspanak/tt9/issues/577#issuecomment-2515314462
License: Public Domain; permission to use in the link
Conjunct consonants list obtained from Wikipedia
Version: 2024-12-30
Sources: https://en.wikipedia.org/wiki/Gujarati_script
License: Creative Commons Attribution-ShareAlike 4.0 License
Gujarati word list and frequencies by: CC-100
Version: 2020
Source: https://data.statmt.org/cc-100/
References (PDF links are available in the source URL):
- Unsupervised Cross-lingual Representation Learning at Scale, Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), p. 8440-8451, July 2020.
- CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data, Guillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary, Francisco Guzmán, Armand Joulin, Edouard Grave, Proceedings of the 12th Language Resources and Evaluation Conference (LREC), p. 4003-4012, May 2020.
Remark: Used all words that appear at least twice, and the words that appear once and are shorter than 10 characters.