Text Summarization with Pretrained Encoders
https://github.com/nlpyang/PreSumm
The original GitHub repository provides 4 pretrained models:
-
- Description: Their baseline abstractive model trained on the CNN/DailyMail dataset
- Name:
liu2019-transformerabs
- Usage:
from repro.models.liu2019 import TransformerAbs model = TransformerAbs() summary = model.predict("document")
-
- Description: A BERT-based extractive model trained on the CNN/DailyMail dataset
- Name:
liu2019-bertsumext
- Usage:
from repro.models.liu2019 import BertSumExt model = BertSumExt() summary = model.predict("document")
-
- Description: A BERT-based abstractive model trained on the CNN/DailyMail dataset
- Name:
liu2019-bertsumextabs
- Usage:
from repro.models.liu2019 import BertSumExtAbs model = BertSumExtAbs() # or BertSumExtAbs("bertsumextabs_cnndm.pt") summary = model.predict("document")
-
- Description: A BERT-based abstractive model trained on the XSum dataset
- Name:
liu2019-bertsumextabs
- Usage:
from repro.models.liu2019 import BertSumExtAbs model = BertSumExtAbs("bertsumextabs_xsum.pt") summary = model.predict("document")
-
The input to the pretrained models is expected to be already preprocessed. Therefore, we tried to replicate their preprocessing steps as closely as we could, which means all of the input documents are tokenized and sentence split using the Stanford CoreNLP library within the docker container.
-
If you pass in a pre-sentence tokenized document, the current implementation does not respect those sentence boundaries and will reprocess the document.
- Image name:
liu2019
- Build command:
Each of the flags indicates whether the corresponding model should be downloaded.
repro setup liu2019 \ [--transformerabs-cnndm] \ [--bertsumext-cnndm] \ [--bertsumextabs-cnndm] \ [--bertsumextabs-xsum] \ [--silent]
- Requires network: No
repro setup liu2019 \
--transformerabs-cnndm \
--bertsumext-cnndm \
--bertsumextabs-cnndm \
--bertsumextabs-xsum
pytest -s models/liu2019/tests
-
Regression unit tests pass
See the latest successful tests on Github here -
Correctness unit tests pass
The authors provide their model outputs and instructions for processing the data from scratch. We did not attempt to perfectly reproduce their summaries. -
Model runs on full test dataset
See here -
Predictions approximately replicate results reported in the paper
The results for the abstractive models approximately replicate the reported in the paper, but the extractive model does not. See this experiment for details. The ROUGE scores are calculated against reference summaries which have been preprocessed in the same way that the input documents are, not the original references.TransformerAbs
on CNN/DailyMailR1 R2 RL Reported 40.21 17.76 37.09 Ours 40.38 17.81 37.10 BertSumExt
on CNN/DailyMailR1 R2 RL Reported 43.23 20.24 39.63 Ours 41.93 18.98 38.07 BertSumExtAbs
on CNN/DailyMailR1 R2 RL Reported 42.13 19.60 39.18 Ours 42.08 19.43 38.95 BertSumExtAbs
on XSumR1 R2 RL Reported 38.81 16.50 31.27 Ours 38.88 16.41 31.31 The abstractive models seem to be faithful reproductions of the original results, whereas the extractive model is not. It is not clear why.
-
Predictions exactly replicate results reported in the paper
See above