support for LLM.int8

pszemraj released this 31 Jan 04:11

· 2 commits to main since this release

On GPU, you can now use LLM.int8 to use less memory:

from textsum.summarize import Summarizer
summarizer = Summarizer(load_in_8bit=True) # loads default model in LLM.int8, taking 1/4 of the memory

What's Changed

Full Changelog: v0.1.3...v0.1.5

pszemraj

Assets 2