Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formula 2 & Observation 2 #12

Open
renmengjie7 opened this issue Dec 4, 2024 · 2 comments
Open

Formula 2 & Observation 2 #12

renmengjie7 opened this issue Dec 4, 2024 · 2 comments

Comments

@renmengjie7
Copy link

截屏2024-12-04 下午4 31 47 论文中的observation2, 是否应该为负相关?

We assume rsk is the PRM’s output sigmoid score at k-th step
请问 rsk 是r-th step的reward吗? 该定义下, 与公式2的设计有些矛盾
截屏2024-12-04 下午4 33 45

@Wloner0809
Copy link

截屏2024-12-04 下午4 31 47 论文中的observation2, 是否应该为负相关?
We assume rsk is the PRM’s output sigmoid score at k-th step 请问 rsk 是r-th step的reward吗? 该定义下, 与公式2的设计有些矛盾 截屏2024-12-04 下午4 33 45

I'm also confused about this formula. Actually I think $1-2r_{s_k}$ should be $2r_{s_k}-1$.

@zhoubiansining
Copy link
Contributor

image
Here our definition of $r_{s_k}$ is consistent with the one in appendix, representing the PRM's output score, not the reward. And the reward for the step is actually $1-r_{s_k}$.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants