You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for a great paper, it definitely solves the monotonic issue of the naive approach!
I have a few questions for the authors:
1.) Is the loss described equivalent to standard binary cross entropy? It appears so, but I noticed in the notebook you have your own derivation as opposed to using the standard PyTorch loss function. Is there a specific reason, or was that just to be explicit about the loss?
2.) While training with this loss is definitely more stable than MSE regression to fixed targets, at inference, is it true the output can be interpreted as regression to 1-dim with thresholds derived from the biases? For example, in a 3 class problem, the final bias weights may be something like [1.01, -1.08]. In this case, a linear layer output of
>= 1.09 will be class 2
1.08 to -1.00 will be class 1
<= -1.01 will be class 0
Just wondering if I’m interpreting this correctly for the following question.
3.) I’m finding an issue where center classes have reduced recall, which makes sense as there’s a narrower range of regression predictions for the correct label (described above). However, the pairwise AUC-ROC is still very high, as outer classes are predicted at extremes, while inner classes are mostly around their respective ranges. Did you also notice this? Were there any solutions? I’ve tried longer training runs, focal loss to prevent the outer classes from “squeezing” the hyperplanes defined by the biases together, and smoothed labels for outer classes. I’ve also tried initializing the biases further apart, and increasing the LR for the linear layer and biases.
Thanks again, and much appreciated if you’re able to address these fairly specific questions 😃
The text was updated successfully, but these errors were encountered:
Thanks for a great paper, it definitely solves the monotonic issue of the naive approach!
I have a few questions for the authors:
1.) Is the loss described equivalent to standard binary cross entropy? It appears so, but I noticed in the notebook you have your own derivation as opposed to using the standard PyTorch loss function. Is there a specific reason, or was that just to be explicit about the loss?
2.) While training with this loss is definitely more stable than MSE regression to fixed targets, at inference, is it true the output can be interpreted as regression to 1-dim with thresholds derived from the biases? For example, in a 3 class problem, the final bias weights may be something like [1.01, -1.08]. In this case, a linear layer output of
Just wondering if I’m interpreting this correctly for the following question.
3.) I’m finding an issue where center classes have reduced recall, which makes sense as there’s a narrower range of regression predictions for the correct label (described above). However, the pairwise AUC-ROC is still very high, as outer classes are predicted at extremes, while inner classes are mostly around their respective ranges. Did you also notice this? Were there any solutions? I’ve tried longer training runs, focal loss to prevent the outer classes from “squeezing” the hyperplanes defined by the biases together, and smoothed labels for outer classes. I’ve also tried initializing the biases further apart, and increasing the LR for the linear layer and biases.
Thanks again, and much appreciated if you’re able to address these fairly specific questions 😃
The text was updated successfully, but these errors were encountered: