Should critic's input be prompt only? #57

ginward · 2023-11-27T14:13:17Z

In the PPO implementation, it seems that the critic model considers both prompt and generated actions as the input (if pooled is true, then generated actions only). However, if we see prompt as S_t and prompt with action as S_t+T, shouldn't the value function be V(S_t) but not V(S_t+T)?

In other words, when calculating the advantage function, shouldn't our value function be the average reward for a prompt?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should critic's input be prompt only? #57

Should critic's input be prompt only? #57

ginward commented Nov 27, 2023

Should critic's input be prompt only? #57

Should critic's input be prompt only? #57

Comments

ginward commented Nov 27, 2023