Confusion about the kd_ce_loss #1

fengjiejiejiejie · 2023-06-25T09:40:15Z

Hi,
thanks for your impressive paper and codes, but I'm confused about the kd_ce_loss:

def FNKD(self, student_outputs, teacher_outputs, student_feature, teacher_feature):
student_L2norm = torch.norm(student_feature)
teacher_L2norm = torch.norm(teacher_feature)
q_fn = F.log_softmax(teacher_outputs / teacher_L2norm, dim=1)
to_kd = F.softmax(student_outputs / student_L2norm, dim=1)
KD_ce_loss = self.ce(
q_fn, to_kd[:, 0].long())
return KD_ce_loss

Why use self.ce after the softmax/log_softmax?
Is to_kd[:, 0] just using the first channel of the student_outputs (background)?
Maybe the following is right?

q_fn = F.log_softmax(teacher_outputs / T, dim=1)
to_kd = F.softmax(student_outputs / T, dim=1)
KD_ce_loss = -torch.mean(torch.sum(to_kd * q_fn, dim=1))

best,
fj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion about the kd_ce_loss #1

Confusion about the kd_ce_loss #1

fengjiejiejiejie commented Jun 25, 2023

Confusion about the kd_ce_loss #1

Confusion about the kd_ce_loss #1

Comments

fengjiejiejiejie commented Jun 25, 2023