Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add supervised analysis tools #42

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Add supervised analysis tools #42

wants to merge 1 commit into from

Conversation

ejnnr
Copy link
Owner

@ejnnr ejnnr commented May 27, 2024

These are helpers to debug and figure out why tasks are easy/hard. Currently inspired by the results from https://www.anthropic.com/research/probes-catch-sleeper-agents, but I'll probably add a few other tools as well. Making an extra analysis package for this to make clear that these are not valid detectors (since they use labels); should probably move the SupervisedProbe there as well.

Currently a lot of duplicated code, I think we might want to use something inspired by TorchMetrics as a general interface for tracking statistics for a stream of data, and share that between detectors and tools as much as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant