Summarization research code

This code accompanies the thesis on embedding-based extractive summarization from Blendle Research, written by Lucas de Haas. It can be used to exactly reproduce all experimental results. It thus contains implementations of various summarization algorithms that were previously not available.

How to run

Set the summarization function(s) in summarizer.py, and then run main.py to output results.

Some files are not included:

The Google word2vec model is not included in this repo, but can be downloaded here; it is expected to be in models/word2vec/google/, and is necessary to run main.py out-of-the-box.
The DUC-2002 and TAC-2008 dataset are not included as access can only be granted by NIST (click on the links for more information on obtaining access).
The Opinosis dataset is included, and main.py is configured to run on this dataset by default.

Requirements

python >= 3.5
pythonrouge
regex
scipy
networkx
gensim
xmltodict
numpy
pattern
nltk
beautifulsoup4
scikit_learn
torch
permute

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
blendle_word2vec		blendle_word2vec
data_extractors		data_extractors
datasets		datasets
enrichers		enrichers
models		models
rnn		rnn
rouge_files/ROUGE-1.5.5		rouge_files/ROUGE-1.5.5
save_dir		save_dir
sent_selectors		sent_selectors
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluation.py		evaluation.py
main.py		main.py
representations.py		representations.py
rouge.py		rouge.py
train_svd.py		train_svd.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summarization research code

How to run

Requirements

About

Releases

Packages

Languages

License

blendle/research-summarization

Folders and files

Latest commit

History

Repository files navigation

Summarization research code

How to run

Requirements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages