Skip to content

This code accompanies the thesis on embedding-based extractive summarization from Blendle Research, written by Lucas de Haas. It can be used to exactly reproduce all experimental results. It thus contains implementations of various summarization algorithms that were previously not available.

License

Notifications You must be signed in to change notification settings

blendle/research-summarization

Repository files navigation

Summarization research code

This code accompanies the thesis on embedding-based extractive summarization from Blendle Research, written by Lucas de Haas. It can be used to exactly reproduce all experimental results. It thus contains implementations of various summarization algorithms that were previously not available.

How to run

Set the summarization function(s) in summarizer.py, and then run main.py to output results.

Some files are not included:

  • The Google word2vec model is not included in this repo, but can be downloaded here; it is expected to be in models/word2vec/google/, and is necessary to run main.py out-of-the-box.
  • The DUC-2002 and TAC-2008 dataset are not included as access can only be granted by NIST (click on the links for more information on obtaining access).
  • The Opinosis dataset is included, and main.py is configured to run on this dataset by default.

Requirements

  • python >= 3.5
  • pythonrouge
  • regex
  • scipy
  • networkx
  • gensim
  • xmltodict
  • numpy
  • pattern
  • nltk
  • beautifulsoup4
  • scikit_learn
  • torch
  • permute

About

This code accompanies the thesis on embedding-based extractive summarization from Blendle Research, written by Lucas de Haas. It can be used to exactly reproduce all experimental results. It thus contains implementations of various summarization algorithms that were previously not available.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published