You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was just familiarizing myself with the code for this library after reading the BPR Paper and am concerned I found a bug. According to Line 5 in Figure 4 of the paper, the model parameters should receive gradient updates according to
However, in this line, z is computed as z = 1.0 / (1.0 + exp(score)), which is the sigmoid function without taking the derivative (and possibly missing a negative as well?). I was able to compare results with my own implementation of BPR on a problem (cannot share until made public in a few months), but it seemed on my dataset my implementation performed better with the proper gradient updates. Would appreciate any feedback if there is some nuance I am missing!
The text was updated successfully, but these errors were encountered:
I believe the gradient in BPR Paper contains two errors. First, the gradient of $\ln{\frac{1}{1+e^{-x}}}$ is not $\frac{-e^{-x}}{1+e^{-x}}$; it is actually $\frac{e^{-x}}{1+e^{-x}}$. Second, BPR involves maximizing the posterior probability, the model parameters should be updated using $\theta \leftarrow \theta + \alpha \frac{\partial{BPR-OPT}}{\partial{\theta}}$. Therefore, the correct parameter update rule should be: $\theta \leftarrow \theta + \alpha(\frac{e^{-x}}{1+e^{-x}} \frac{\partial{\hat{x_{uij}}}}{\partial {\theta}} - \lambda \theta)$. This implicit library program differs from both the paper and my formula, but I don't understand why it still works.
I was just familiarizing myself with the code for this library after reading the BPR Paper and am concerned I found a bug. According to Line 5 in Figure 4 of the paper, the model parameters should receive gradient updates according to
However, in this line, z is computed as
z = 1.0 / (1.0 + exp(score))
, which is the sigmoid function without taking the derivative (and possibly missing a negative as well?). I was able to compare results with my own implementation of BPR on a problem (cannot share until made public in a few months), but it seemed on my dataset my implementation performed better with the proper gradient updates. Would appreciate any feedback if there is some nuance I am missing!The text was updated successfully, but these errors were encountered: