Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug in calculation of gradient updates for BPR #716

Open
mturner-ml opened this issue May 14, 2024 · 1 comment
Open

Potential bug in calculation of gradient updates for BPR #716

mturner-ml opened this issue May 14, 2024 · 1 comment

Comments

@mturner-ml
Copy link

mturner-ml commented May 14, 2024

I was just familiarizing myself with the code for this library after reading the BPR Paper and am concerned I found a bug. According to Line 5 in Figure 4 of the paper, the model parameters should receive gradient updates according to

$$\Theta \leftarrow \Theta + \alpha ( \frac{e^{-\hat{x_{uij}}}}{1+e^{-\hat{x_{uij}}}} \cdot \hat{x_{uij}} + \lambda_{\Theta} \cdot \Theta )$$

However, in this line, z is computed as z = 1.0 / (1.0 + exp(score)), which is the sigmoid function without taking the derivative (and possibly missing a negative as well?). I was able to compare results with my own implementation of BPR on a problem (cannot share until made public in a few months), but it seemed on my dataset my implementation performed better with the proper gradient updates. Would appreciate any feedback if there is some nuance I am missing!

@w-toma
Copy link

w-toma commented Dec 28, 2024

I believe the gradient in BPR Paper contains two errors. First, the gradient of $\ln{\frac{1}{1+e^{-x}}}$ is not $\frac{-e^{-x}}{1+e^{-x}}$; it is actually $\frac{e^{-x}}{1+e^{-x}}$. Second, BPR involves maximizing the posterior probability, the model parameters should be updated using $\theta \leftarrow \theta + \alpha \frac{\partial{BPR-OPT}}{\partial{\theta}}$. Therefore, the correct parameter update rule should be: $\theta \leftarrow \theta + \alpha(\frac{e^{-x}}{1+e^{-x}} \frac{\partial{\hat{x_{uij}}}}{\partial {\theta}} - \lambda \theta)$. This implicit library program differs from both the paper and my formula, but I don't understand why it still works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants