Skip to content

Releases: pszemraj/textsum

batch processing improvements

18 Feb 22:52
82bafca
Compare
Choose a tag to compare

improvements for batch processing

A small release that includes some improvements to the Summarizer class for batch-processing related use.

let's say you've loaded your Summarizer class:

from textsum.summarize import Summarizer

model_name = "pszemraj/pegasus-x-large-book_synthsumm-bf16" # recent model
summarizer = Summarizer(model_name)

new features/improvements:

Smart __call__ Function for Summarizer Class:

  • Added a smart __call__ function to automatically distinguish between text input and file paths for summarization, allowing easier integration into batch processing and .map() tasks.
# Directly passing text to be summarized
summary_text = summarizer("This is a sample text to summarize.")
print(summary_text)

# Passing a file path to be summarized
output_filepath = summarizer(
    "/path/to/textfile.extension",
    output_dir="./my-summary-stash",
)
print(output_filepath)

Enhanced Batch Processing Controls:

  • Introduced disable_progress_bar and batch_delimiter options to improve control over batch processing and output formatting
from datasets import load_dataset

dataset = load_dataset("Trelis/tiny-shakespeare")
dataset = dataset.map(
    lambda x: {"summary": summarizer(x["text"], disable_progress_bar=True)},
    batched=False,
) # doesn't spam you with multiple progress bars!!
print(dataset)

Note: You can pass disable_progress_bar=True when instantiating the Summarizer() for cleaner inference.

You can now set the 'summary batch delimiter' string by the batch_delimiter arg when running inference:

summary_output = summarizer(text, batch_delimiter="<I AM A DELIMITER>")
print(summary_output)
# "Summary of first chunk.<I AM A DELIMITER>Summary of second chunk.<I AM A DELIMITER>Summary of third chunk."

by default, it's "\n\n"

Misc

  • default parameter update: the length_penalty for inference is now 1.0 (was 0.8)
  • code cleanup across modules, mostly for readability and maintainability.

What's Changed

Full Changelog: v0.2.0...v0.2.1

inference optimization ⚗

08 Jul 01:10
d51c4cd
Compare
Choose a tag to compare

🦿 this release adds support for some features that can make inference faster:

  • support for torch compile & optimum onnx1
  • improved the textsum-dir command, more options/streamline etc, added fire package to help with that
    • the saved config JSON files are now better structured to keep track of parameters, etc
  • some small adjustments to the Summarizer class

Next up: the UI app will finally get an overhaul.

  1. please note that Support for is not an equivalent statement to "I have tested every longctx model with ONNX max quantization and sign off guaranteeing they will all provide accurate results". I've had some good results, but also some strange ones (with Long-T5 specifically). Test beforehand, and file an issue on the Optimum repo as needed 🙏

support for LLM.int8

31 Jan 04:11
9108f66
Compare
Choose a tag to compare

On GPU, you can now use LLM.int8 to use less memory:

from textsum.summarize import Summarizer
summarizer = Summarizer(load_in_8bit=True) # loads default model in LLM.int8, taking 1/4 of the memory

What's Changed

Full Changelog: v0.1.3...v0.1.5

minor doc & logging updates

21 Jan 23:39
419eb3b
Compare
Choose a tag to compare

improves docs, logging, and makes it easier to set the inference params from JSON

What's Changed

Full Changelog: v0.1.2...v0.1.3

pip install textsum

18 Jan 23:00
7405685
Compare
Choose a tag to compare

updated docs reflecting that it's on pypi!

pip install textsum

What's Changed

Full Changelog: v0.1.1...v0.1.2

pypi

18 Jan 22:17
83a11f5
Compare
Choose a tag to compare

add to pypi

Summarize class object

18 Jan 19:47
f096278
Compare
Choose a tag to compare

easy-to-use API in python courtesy of a class object:

from textsum.summarize import Summarizer

summarizer = Summarizer() # loads default model and parameters
out_str = summarizer.summarize_string('This is a long string of text that will be summarized.')
print(out_str)

What's Changed

Full Changelog: v0.0.5...v0.1

v0.0.5

16 Jan 03:44
aa63119
Compare
Choose a tag to compare
v0.0.5 Pre-release
Pre-release
  • Adds functionality in a CLI summarization workflow to summarize all text files in a dir: textsum-dir
  • UI demo with gradio CLI updated to textsum-ui

What's Changed

Full Changelog: v0.0.1...v0.0.5

MWE

20 Dec 08:36
0beb6c5
Compare
Choose a tag to compare
MWE Pre-release
Pre-release

minimum working functionality "porting" the hf space to a python package that can be set up to instantiate the same demo locally with the ts-ui command