Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about Mimi: loss balancing and bfloat16/mixed precision training. #178

Open
1 task done
SarthakYadav opened this issue Jan 1, 2025 · 3 comments
Open
1 task done
Labels
question Further information is requested

Comments

@SarthakYadav
Copy link

Due diligence

  • I have done my due diligence in trying to find the answer myself.

Topic

The paper

Question

Thanks for the great work. I'm trying to reproduce Mimi and had the following questions:

  1. Does Mimi use a loss balancer such as that used in Encodec for training? The paper points to the default Encodec configuration in AudioCraft which uses loss balancing, so I was wondering if that's the case for Mimi as well.
  2. Was Mimi trained in bfloat16? Or did the actual training happen in full precision and the weights were exported in bfloat16?

Thanks!

@SarthakYadav SarthakYadav added the question Further information is requested label Jan 1, 2025
@akshatvishu
Copy link

akshatvishu commented Jan 8, 2025

Hey @SarthakYadav 👋 , it seems the training code for Mimi has not yet been released, but they plan to do so in the near future, as mentioned in their FAQ section. However, here's what I could gather from the current resources:
In-their readme.md page, they state :

Finally, and similarly to EBEN, Mimi uses only an adversarial training loss, along with feature matching, showing strong improvements in terms of subjective quality despite its low bitrate.

so, i personally don't think they've used a loss balancer.

As for Q2, I think only the official team could answer as they've not released the training code yet!

@LaurentMazare
Copy link
Member

2. Was Mimi trained in bfloat16? Or did the actual training happen in full precision and the weights were exported in bfloat16?

Which weights are you referring to? Looking at the model.safetensors file on our huggingface repo, the weights should actually be in fp32 rather than bf16 (and this should be the case too in our other repos).

@SarthakYadav
Copy link
Author

@LaurentMazare Thanks, tokenizer weights are indeed fp32, it's only the moshi weights that are bf16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants