Feature: Basic distilling. #6527

marko1616 · 2025-01-03T17:31:04Z

What does this PR do?

Fixes # (issue)

hiyouga · 2025-01-08T05:02:18Z

src/llamafactory/train/distilling/trainer.py

+        else:
+            self.processing_class: "PreTrainedTokenizer" = kwargs.get("tokenizer")
+
+        self.teacher_model = teacher_model


Does it work on DDP setting?

I'm working on deepspeed. Default DDP is working.

Anyway I will not gonna using GKDTrainer in trl>=11.0 because this can't use for mllm.

It is working on deepspeed but require deepspeed==0.15.4

src/llamafactory/train/distilling/trainer.py

marko1616 added 3 commits January 4, 2025 00:13

Basic distilling.

a5497b4

Linter.

cb65544

Label mask.

fca671b

hiyouga self-requested a review January 8, 2025 05:01

hiyouga reviewed Jan 8, 2025

View reviewed changes

hiyouga added the pending This problem is yet to be addressed label Jan 8, 2025

Remove predict.

779a0de

hiyouga self-requested a review January 12, 2025 09:31