Skip to content

v0.4.1

Compare
Choose a tag to compare
@haileyschoelkopf haileyschoelkopf released this 31 Jan 15:29
· 297 commits to main since this release
a0a2fec

Release Notes

This PR release contains all changes so far since the release of v0.4.0 , and is partially a test of our release automation, provided by @anjor .

At a high level, some of the changes include:

  • Data-parallel inference using vLLM (contributed by @baberabb )
  • A major fix to Huggingface model generation--previously, in v0.4.0, due to a bug with stop sequence handling, generations were sometimes cut off too early.
  • Miscellaneous documentation updates
  • A number of new tasks, and bugfixes to old tasks!
  • The support for OpenAI-like API models using local-completions or local-chat-completions ( Thanks to @veekaybee @mgoin @anjor and others on this)!
  • Integration with tools for visualization of results, such as with Zeno, and WandB coming soon!

More frequent (minor) version releases may be done in the future, to make it easier for PyPI users!

We're very pleased by the uptick in interest in LM Evaluation Harness recently, and we hope to continue to improve the library as time goes on. We're grateful to everyone who's contributed, and are excited by how many new contributors this version brings! If you have feedback for us, or would like to help out developing the library, please let us know.

In the next version release, we hope to include

  • Chat Templating + System Prompt support, for locally-run models
  • Improved Answer Extraction for many generative tasks, making them more easily run zero-shot and less dependent on model output formatting
  • General speedups and QoL fixes to the non-inference portions of LM-Evaluation-Harness, including drastically reduced startup times / faster non-inference processing steps especially when num_fewshot is large!
  • A new TaskManager object and the deprecation of lm_eval.tasks.initialize_tasks(), for achieving the easier registration of many tasks and configuration of new groups of tasks

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.4.1