Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

This repository has been archived by the owner on Oct 2, 2024. It is now read-only.

mlcommons / modelgauge Public archive

Notifications You must be signed in to change notification settings
Fork 7
Star 26

Code
Issues 63
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: mlcommons/modelgauge

Releases · mlcommons/modelgauge

v0.6.3

13 Sep 00:19

bkorycki

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.6.3 Pre-release

Pre-release

What's Changed

Add the wildguard private annotator, with some refactoring. by @rogthefrog in #554
HuggingFace Inference SUT by @bkorycki in #561
Safetests use first batch of v1.0 prompts by @bkorycki in #563

Full Changelog: v0.6.2...v0.6.3

Contributors

rogthefrog and bkorycki

Assets 2

Loading

All reactions

v0.6.2

05 Sep 20:50

wpietri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.6.2 Pre-release

Pre-release

What's Changed

Officially add new annotators by @bkorycki in #550

Full Changelog: v0.6.1...v0.6.2

Contributors

bkorycki

Assets 2

Loading

All reactions

v0.6.1

05 Sep 15:05

wpietri

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.6.1 Pre-release

Pre-release

What's Changed

Fix bug where bad raw annotations are cached forever
Remove safetest base class
Minor improvements for pipeline debugging
Adding 'system' role to openai_client _ROLE_MAP by @shachihk-intel
Better together API errors
Keep track of items that can't be processed
Updated dependencies and add notebook linter
Remove deprecated Together models, and update tests to match

New Contributors

@rogthefrog made their first contribution in #512
@shachihk-intel made their first contribution in #534

Full Changelog: v0.6.0...v0.6.1

Contributors

rogthefrog and shachihk-intel

Assets 2

Loading

All reactions

v0.6.0

06 Aug 21:12

bkorycki

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.6.0 Pre-release

Pre-release

What's Changed

Together and HuggingFace SUTs can now return log probs in their responses when requested.
New CLI option --plugin-dir loads local plugins at runtime.
Increase reliability of downloading test data.
Prepare modelgauge infra files for safety evaluator testing (new "System" chat role, minor llama_guard_annotator refactor).
Documentation updates, including initial API reference.
Introduce Pipeline and related classes to serve as the base for a composable set of objects that handle common bulk processing tasks like running prompts, getting annotations, and any other slow I/O-bound workloads.
SafeTests use files from dev deployment of modellab.
New run-csv-items command quickly runs batches of prompts and/or responses in a CSV file through some SUTs and/or annotators.
Add new v1.0 SafeTest class and place-holder test safe-dfm-1.0. Version 0.5 tests (e.g. safe-cae) are not affected.
Move Together plugin files + SafeTest into core modelgauge library.

New Contributors

@tsunamit made their first contribution in #449
@HuaizhengZhang made their first contribution in #489

Full Changelog: v0.5.1...v0.6.0

Contributors

HuaizhengZhang and tsunamit

Assets 2

Loading

All reactions

v0.5.1

26 Apr 21:10

bkorycki

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.5.1 Pre-release

Pre-release

What's Changed

Updated docs
SafeTest compatible with python 3.11+
Add new Llama Guard 2 to LlamaGuardAnnotator
- Can configure LlamaGuardAnnotator with optional llama_guard_version parameter. Defaults to Llama Guard 2
- Minor changes to prompt/category formatting for Llama Guard 1. This may affect results.
SafeTest can also be configured to use Llama Guard 1 or 2 as it's annotator. Defaults to version 2.

Full Changelog: v0.5.0...v0.5.1

Assets 2

Loading

All reactions

v0.5.0

15 Apr 22:35

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.5.0 Pre-release

Pre-release

What's Changed

Renamed to ModelGauge and started pushing to PyPI!
A whole bunch of cleanups and preparation for the more public release.
Caching now supports dicts.
Unit tests to ensure you can install from PyPI and run in a notebook.
Expand range of supported python versions to 3.10 and up.
Remove benign hazard from SafeTest.
Start setting up ReadTheDocs.

Full Changelog: v0.3.3...v0.5.0

Assets 2

Loading

All reactions

v0.3.3

09 Apr 23:00

bkorycki

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.3.3 Pre-release

Pre-release

What's Changed

Change SafeTest to data_april04 release.
- More prompts
- Removed safe-ben

Full Changelog: v0.3.2...v0.3.3

Assets 2

Loading

All reactions

v0.3.2

09 Apr 21:50

bkorycki

Compare

Choose a tag to compare

Loading

v0.3.2 Pre-release

Pre-release

What's Changed

max_test_items returns a relatively stable set of prompts
Loading bar for plugins
Have list command report prettier values for secrets
Time out requests stuck on TogetherAI
Updated docs
Move simple_test_runner out of plugins and into core library

Full Changelog: v0.3.1...v0.3.2

Assets 2

Loading

All reactions

v0.3.1

03 Apr 17:13

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.3.1 Pre-release

Pre-release

What's Changed

Fix bad version specification for together dependency, which was causing 0.3.0 to not actually install.
Add Deepseek model that is now available on Together.
Stabilize the order of TestItems in SafeTest to better utilize caching.

Full Changelog: v0.3.0...v0.3.1

Assets 2

Loading

All reactions

v0.3.0

02 Apr 22:03

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.3.0 Pre-release

Pre-release

What's Changed

Reorganized the run_data folder and made several improvements to caching. This breaks backward comparability. Old files should just be ignored, but if you run into issues, probably best to just delete your run_data folder.
Updated SafeTest to 02apr2024.
We now have all SUTs in the requested set, minus Deepseek.
Simplified the command line to be newhelm once installed or poetry run newhelm when using the local repo.
Annotations are now recorded per completion instead of per TestItem.
HuggingFace sets pad token to default, which should remove warning messages.
Added some enforcement of SUTCapabilities to help them be accurate.
Remove all "Base" prefixes except BaseTest.

Full Changelog: v0.2.6...v0.3.0

Assets 2

Loading

All reactions

Previous 1 2 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.