Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pg_loss1也可能是正也可能是负吧。 #293

Open
Harryjun opened this issue Dec 19, 2024 · 0 comments
Open

pg_loss1也可能是正也可能是负吧。 #293

Harryjun opened this issue Dec 19, 2024 · 0 comments

Comments

@Harryjun
Copy link

log_ratio = (logprobs - old_logprobs) * mask
        ratio = torch.exp(log_ratio.float())   

logprobs和old_logprobs不一定谁大,所以
这里log_ratio可能为正或者为负吧,

pg_loss1 = -advantages * ratio
        pg_loss2 = -advantages * torch.clamp(
            ratio,
            1.0 - self.args.cliprange,
            1.0 + self.args.cliprange,
        )

所以这里pg_loss1也可能是正也可能是负吧。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant