A Structured Self-attentive Sentence Embedding

Mini-project for the deep learning course based on A Structured Self-attentive Sentence Embedding by Lin et al.

The code has been adapted from the repo of Freda Shi.

Preprocessing

To generate the dataset, you will need to install spacy and run:

python tokenizer-yelp.py --input [Yelp dataset] --output [output path, will be a json file] --dict [output dictionary path, will be a json file]

A small version of the tokenized dataset is available here.

In order to get the Glove vectors as PyTorch tensors, you can use torchtext, see here. For convenience, I did it for glove.6B.200d.txt.pt.

Running on Colab

Now, provided you downloaded everything on Colab, the training can be done via:

python train.py data.train_data="/content/small/train_tok.json" data.val_data="/content/small/val_tok.json" data.test_data="/content/small/test_tok.json" data.dictionary="/content/small/dict_review_short.json" data.word_vector="content/glove.6B.200d.txt.pt" data.save="/content/self-attentive-sentence-embedding/models/model-small-6B.pt"

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
glove_tensors.ipynb		glove_tensors.ipynb
models.py		models.py
requirements.txt		requirements.txt
sase_colab.ipynb		sase_colab.ipynb
tokenizer-yelp.py		tokenizer-yelp.py
train.py		train.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Structured Self-attentive Sentence Embedding

Preprocessing

Running on Colab

About

Releases

Packages

Languages

License

dataflowr/Project-self-attentive-sentence-embedding

Folders and files

Latest commit

History

Repository files navigation

A Structured Self-attentive Sentence Embedding

Preprocessing

Running on Colab

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages