Releases: lucidrains/PaLM-rlhf-pytorch
Releases · lucidrains/PaLM-rlhf-pytorch
0.0.23
rename to ActorCritic and cleanup
0.0.22
critic model could be completely different if need be
0.0.21
only calculate if kl div loss weight set to greater than 0
0.0.20
critic now has its own LoRA parameters
0.0.18
fix a bunch of things
0.0.17
make a guess, think this is what the blogpost meant
0.0.16
everything in readme runs at least
0.0.15
everything in readme runs at least
0.0.14
get everything prepped for actual ppo code
0.0.11
take care of calculating values, rewards, entropies, kl div under var…