qwen2 MG and HF mismatch #289

vlad-karpuhin · 2024-07-16T18:08:20Z

When converting HF QWEN2 checkpoint into megatron, your conversion script reports the differences between both models layers. There are a lot of mismatches.
The root cause of the mismatch seems to be in the scaled dot product attention implementations between MG and HF, with the same inputs and settings, they produce different outputs. Not completely different, but still non negligible.
Did you experience that? if yes, are you aware of any decreased effectiveness of such converted qwen model later?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen2 MG and HF mismatch #289

qwen2 MG and HF mismatch #289

vlad-karpuhin commented Jul 16, 2024

qwen2 MG and HF mismatch #289

qwen2 MG and HF mismatch #289

Comments

vlad-karpuhin commented Jul 16, 2024