Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write safety trainer #5

Open
ameliahardy opened this issue Jan 8, 2025 · 1 comment
Open

Write safety trainer #5

ameliahardy opened this issue Jan 8, 2025 · 1 comment

Comments

@ameliahardy
Copy link
Collaborator

Safety trainer should train LLM against the offensive trajectories discovered via ASTPrompter

@Jemoka
Copy link
Member

Jemoka commented Jan 8, 2025

  • which *po is being used?
  • what are the baselines?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants