ModelGauge

ModelGauge was originally planned to be an evolution of crfm-helm, intended to meet their existing use cases as well as those needed by the MLCommons AI Safety project. However, that project, instead of using a big set of existing tests instead developed a smaller set of custom ones. Because of that, some of this code was moved into the related project MLCommons ModelBench and this repo was archived.

Summary

ModelGauge is a library that provides a set of interfaces for Tests and Systems Under Test (SUTs) such that:

Each Test can be applied to all SUTs with the required underlying capabilities (e.g. does it take text input?)
Adding new Tests or SUTs can be done without modifications to the core libraries or support from ModelGauge authors.

Currently ModelGauge is targeted at LLMs and single turn prompt response Tests, with Tests scored by automated Annotators (e.g. LlamaGuard). However, we expect to extend the library to cover more Test, SUT, and Annotation types as we move toward full release.

Docs

Developer Quick Start
Tutorial for how to create a Test
Tutorial for how to create a System Under Test (SUT)
How we use plugins to connect it all together.

Name		Name	Last commit message	Last commit date
Latest commit History 365 Commits
.github		.github
demo_plugin		demo_plugin
docs		docs
modelgauge		modelgauge
plugins		plugins
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
conftest.py		conftest.py
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
publish_all.py		publish_all.py
pyproject.toml		pyproject.toml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModelGauge

Summary

Docs

About

Releases

Packages

Contributors 14

Languages

License

mlcommons/modelgauge

Folders and files

Latest commit

History

Repository files navigation

ModelGauge

Summary

Docs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 14

Languages

Packages