This repository has been archived by the owner on Oct 2, 2024. It is now read-only.
Releases: mlcommons/modelgauge
Releases · mlcommons/modelgauge
v0.6.3
What's Changed
- Add the wildguard private annotator, with some refactoring. by @rogthefrog in #554
- HuggingFace Inference SUT by @bkorycki in #561
- Safetests use first batch of v1.0 prompts by @bkorycki in #563
Full Changelog: v0.6.2...v0.6.3
v0.6.2
v0.6.1
What's Changed
- Fix bug where bad raw annotations are cached forever
- Remove safetest base class
- Minor improvements for pipeline debugging
- Adding 'system' role to openai_client _ROLE_MAP by @shachihk-intel
- Better together API errors
- Keep track of items that can't be processed
- Updated dependencies and add notebook linter
- Remove deprecated Together models, and update tests to match
New Contributors
- @rogthefrog made their first contribution in #512
- @shachihk-intel made their first contribution in #534
Full Changelog: v0.6.0...v0.6.1
v0.6.0
What's Changed
- Together and HuggingFace SUTs can now return log probs in their responses when requested.
- New CLI option
--plugin-dir
loads local plugins at runtime. - Increase reliability of downloading test data.
- Prepare modelgauge infra files for safety evaluator testing (new "System" chat role, minor
llama_guard_annotator
refactor). - Documentation updates, including initial API reference.
- Introduce
Pipeline
and related classes to serve as the base for a composable set of objects that handle common bulk processing tasks like running prompts, getting annotations, and any other slow I/O-bound workloads. - SafeTests use files from dev deployment of modellab.
- New
run-csv-items
command quickly runs batches of prompts and/or responses in a CSV file through some SUTs and/or annotators. - Add new v1.0 SafeTest class and place-holder test
safe-dfm-1.0
. Version 0.5 tests (e.g.safe-cae
) are not affected. - Move Together plugin files + SafeTest into core modelgauge library.
New Contributors
- @tsunamit made their first contribution in #449
- @HuaizhengZhang made their first contribution in #489
Full Changelog: v0.5.1...v0.6.0
v0.5.1
What's Changed
- Updated docs
- SafeTest compatible with python 3.11+
- Add new Llama Guard 2 to
LlamaGuardAnnotator
- Can configure
LlamaGuardAnnotator
with optionalllama_guard_version
parameter. Defaults to Llama Guard 2 - Minor changes to prompt/category formatting for Llama Guard 1. This may affect results.
- Can configure
- SafeTest can also be configured to use Llama Guard 1 or 2 as it's annotator. Defaults to version 2.
Full Changelog: v0.5.0...v0.5.1
v0.5.0
What's Changed
- Renamed to ModelGauge and started pushing to PyPI!
- A whole bunch of cleanups and preparation for the more public release.
- Caching now supports dicts.
- Unit tests to ensure you can install from PyPI and run in a notebook.
- Expand range of supported python versions to 3.10 and up.
- Remove benign hazard from SafeTest.
- Start setting up ReadTheDocs.
Full Changelog: v0.3.3...v0.5.0
v0.3.3
What's Changed
- Change SafeTest to data_april04 release.
- More prompts
- Removed safe-ben
Full Changelog: v0.3.2...v0.3.3
v0.3.2
What's Changed
max_test_items
returns a relatively stable set of prompts- Loading bar for plugins
- Have
list
command report prettier values for secrets - Time out requests stuck on TogetherAI
- Updated docs
- Move
simple_test_runner
out of plugins and into core library
Full Changelog: v0.3.1...v0.3.2
v0.3.1
What's Changed
- Fix bad version specification for
together
dependency, which was causing 0.3.0 to not actually install. - Add Deepseek model that is now available on Together.
- Stabilize the order of TestItems in SafeTest to better utilize caching.
Full Changelog: v0.3.0...v0.3.1
v0.3.0
What's Changed
- Reorganized the
run_data
folder and made several improvements to caching. This breaks backward comparability. Old files should just be ignored, but if you run into issues, probably best to just delete yourrun_data
folder. - Updated SafeTest to 02apr2024.
- We now have all SUTs in the requested set, minus Deepseek.
- Simplified the command line to be
newhelm
once installed orpoetry run newhelm
when using the local repo. - Annotations are now recorded per completion instead of per TestItem.
- HuggingFace sets pad token to default, which should remove warning messages.
- Added some enforcement of SUTCapabilities to help them be accurate.
- Remove all "Base" prefixes except BaseTest.
Full Changelog: v0.2.6...v0.3.0