You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here our definition of $r_{s_k}$ is consistent with the one in appendix, representing the PRM's output score, not the reward. And the reward for the step is actually $1-r_{s_k}$.
We assume rsk is the PRM’s output sigmoid score at k-th step
请问 rsk 是r-th step的reward吗? 该定义下, 与公式2的设计有些矛盾
The text was updated successfully, but these errors were encountered: