You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why use self.ce after the softmax/log_softmax?
Is to_kd[:, 0] just using the first channel of the student_outputs (background)?
Maybe the following is right?
Hi,
thanks for your impressive paper and codes, but I'm confused about the kd_ce_loss:
def FNKD(self, student_outputs, teacher_outputs, student_feature, teacher_feature):
student_L2norm = torch.norm(student_feature)
teacher_L2norm = torch.norm(teacher_feature)
q_fn = F.log_softmax(teacher_outputs / teacher_L2norm, dim=1)
to_kd = F.softmax(student_outputs / student_L2norm, dim=1)
KD_ce_loss = self.ce(
q_fn, to_kd[:, 0].long())
return KD_ce_loss
Why use self.ce after the softmax/log_softmax?
Is to_kd[:, 0] just using the first channel of the student_outputs (background)?
Maybe the following is right?
q_fn = F.log_softmax(teacher_outputs / T, dim=1)
to_kd = F.softmax(student_outputs / T, dim=1)
KD_ce_loss = -torch.mean(torch.sum(to_kd * q_fn, dim=1))
best,
fj
The text was updated successfully, but these errors were encountered: