This is a community effort meant to be a guide to best practices regarding building language technology in new languages. We would like to have guides for the following tasks.
- Keyboard
- Speech Synthesis
- Speech Recognition
- Optical Character Recognition
- Part-of-speech, Morphological Analysis
- Full Syntactic Analysis
- Named Entity Recognition
- Machine Translation
We also provide some pointers to general knowledge topics such as data collection, pretraining models and so on.
Please contact us via the "issues" in the github repository if you want to see something new or if you have any questions. If you want to contribute you can either contact via "issues" or send a PR directly.
To see other examples of datasets or tools that were created with these directions, please click through to the 🏆Hall of Fame🏆 where you can see other like-minded projects, or share your own project!
- Zero-to-Digital Guide by Translation Commons
- Data Gathering Guide by Translation Commons