Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Paper Questions #33

Open
GerardMaggiolino opened this issue May 11, 2021 · 0 comments
Open

General Paper Questions #33

GerardMaggiolino opened this issue May 11, 2021 · 0 comments

Comments

@GerardMaggiolino
Copy link

GerardMaggiolino commented May 11, 2021

Thanks for a great paper, it definitely solves the monotonic issue of the naive approach!

I have a few questions for the authors:

1.) Is the loss described equivalent to standard binary cross entropy? It appears so, but I noticed in the notebook you have your own derivation as opposed to using the standard PyTorch loss function. Is there a specific reason, or was that just to be explicit about the loss?

2.) While training with this loss is definitely more stable than MSE regression to fixed targets, at inference, is it true the output can be interpreted as regression to 1-dim with thresholds derived from the biases? For example, in a 3 class problem, the final bias weights may be something like [1.01, -1.08]. In this case, a linear layer output of

  • >= 1.09 will be class 2
  • 1.08 to -1.00 will be class 1
  • <= -1.01 will be class 0

Just wondering if I’m interpreting this correctly for the following question.

3.) I’m finding an issue where center classes have reduced recall, which makes sense as there’s a narrower range of regression predictions for the correct label (described above). However, the pairwise AUC-ROC is still very high, as outer classes are predicted at extremes, while inner classes are mostly around their respective ranges. Did you also notice this? Were there any solutions? I’ve tried longer training runs, focal loss to prevent the outer classes from “squeezing” the hyperplanes defined by the biases together, and smoothed labels for outer classes. I’ve also tried initializing the biases further apart, and increasing the LR for the linear layer and biases.

Thanks again, and much appreciated if you’re able to address these fairly specific questions 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant