Skip to content

Latest commit

 

History

History
32 lines (26 loc) · 1.34 KB

File metadata and controls

32 lines (26 loc) · 1.34 KB

Cross Lingual Classification without translation or retraining

This project aim to create a sentiment classification model to be trained in one language and use it without retraining or translation for a new language

Dependencies

Datasets

Amazon review datasets:

  • Book review dataset in data/amazon-data
  • 2000 for training and 2000 for testing
  • Rating used as labels for positve or negative sentiment

You can download the English (en) French (es) and German (de) embeddings this way:

# English MUSE embeddings
curl -o data/wiki.en.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.en.vec
# French MUSE Wikipedia embeddings
curl -o data/wiki.fr.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.fr.vec
# German MUSE Wikipedia embeddings
curl -o data/wiki.de.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.de.vec

Train and test classifier

This project includes testing all language pair to i.e En-En, En-Fr, En-De ,Fr-Fr, Fr-En, Fr-De, De-De, De-En, De-Fr:

To evaluate the results simply run:

python crosslingual-classification.py