Finding a Balanced Degree of Automation for Summary Evaluation
https://github.com/ZhangShiyue/Lite2-3Pyramid
- Lite3Pyramid
- Description: An automated Pyramid Score based on SRL
- Name:
zhang2021-lite3pyramid
- Usage:
from repro.models.zhang2021 import Lite3Pyramid model = Lite3Pyramid() inputs = [ {"candidate": "The candidate summary", "references": ["The references"]} ] macro, micro = model.predict(inputs) inputs = [ {"candidate": "The candidate summary", "units_list": [["STU 1 for reference 1", "STU 2"]]} ] macro, micro = model.predict(inputs)
macro
is the Lite3Pyramid scores averaged over the inputs.micro
is the per-input scores, each averaged over the references per input.
-
Image name:
danieldeutsch/zhang2021:1.0
-
Docker Hub:
-
Build command:
repro setup zhang2021 [--models <model-name>+]
The
--models
argument specifics which pretrained NLI models will be pre-cached inside of the Docker image. See here for the available models. -
Requires network: Yes, AllenNLP sends a request for a model even if the model is available locally.
repro setup zhang2021
pytest models/zhang2021/tests
Most of the tests require using a GPU for speed purposes.
-
Regression unit tests pass
-
Correctness unit tests pass
The STU extraction gives slightly different results, but calculating the scores given a summary and STUs gives the expected result. -
Model runs on full test dataset
Not tested -
Predictions approximately replicate results reported in the paper
-
Predictions exactly replicate results reported in the paper