Skip to content

Latest commit

 

History

History
47 lines (32 loc) · 1.97 KB

README.md

File metadata and controls

47 lines (32 loc) · 1.97 KB

Works from our Volunteers

Whole list

Malayalam

mlmorph - Malayalam Morphological Analyzer using Finite State Transducer

Tamil

Datasets

Datasets in tamil text

Scrapers

  1. Tamil Etymological Dictionary
  2. Newspaper Crawlers

ML models

Text Classification model in Pytorch: Can be easily applied to other datasets, infact the linked repository also contains a dataset for film reviews in tamil.

Bengali

Bangla2Vec

Bengali News Classification

Scrapers

Bengali News Channel Scraper

Research Papers and Data

Research Papers in Bengali NLP

Hindi

NLP for Hindi

  • Contains Wikipedia Articles Dataset (55,000 articles) and scripts which were used to scrape Wikipedia and clean that dataset
  • Contains Hindi Movie Reviews Dataset and scripts which were used to scrape those Movie Reviews from Hindi News Websites
  • Contains Language Model with Perplexity ~36
  • Contains Movie Review classification model with Kappa Score ~30
  • Contains BBC News Classification Model with Accuracy ~79

Punjabi

NLP for Punjabi

  • Contains Wikipedia Articles Dataset (44,000 articles) and scripts which were used to scrape Wikipedia and clean that dataset
  • Contains BBC Punjabi News dataset and scripts which were used to scrape those News articles from Punjabi News Websites
  • Contains Language Model with Perplexity ~13
  • Contains BBC News Classification Model with kappa score ~49