This project aim to create a sentiment classification model to be trained in one language and use it without retraining or translation for a new language
- Python 2/3 with NumPy/SciPy
- scikit-learn
- nltk
Amazon review datasets:
- Book review dataset in data/amazon-data
- 2000 for training and 2000 for testing
- Rating used as labels for positve or negative sentiment
You can download the English (en) French (es) and German (de) embeddings this way:
# English MUSE embeddings
curl -o data/wiki.en.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.en.vec
# French MUSE Wikipedia embeddings
curl -o data/wiki.fr.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.fr.vec
# German MUSE Wikipedia embeddings
curl -o data/wiki.de.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.de.vec
This project includes testing all language pair to i.e En-En, En-Fr, En-De ,Fr-Fr, Fr-En, Fr-De, De-De, De-En, De-Fr:
To evaluate the results simply run:
python crosslingual-classification.py