-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infer_step selection #18
Comments
Hello, @yxlu-0102 ! When I executed the 8-step inference process like you did, I noticed significant noise in the high-frequency range of the reconstructed speech. How many steps of the inference process did you perform to eliminate the noise or to achieve results similar to those in the paper? I sincerely appreciate your help and hope that your advice will lead to better results. Have a good day :) |
Hi, I used the checkpoint and inference codes of nu-wave2 provided by the author of the UDM+ in their repository. |
Hello, @yxlu-0102 ! Thank you so much for your quick and valuable response. I believe that by experimenting with the checkpoint you provided, I might achieve better results. 😊 Did you still set the inference steps to 8 with the new checkpoint? In my case, when I set the inference steps to 8 with the new checkpoint, there is noise. When I increase the inference steps to over 50, the sound quality improves, but the LSD value exceeds 2, indicating a tendency towards excessive denoising. 🥲 I'm curious to know the number of inference steps you used with the new checkpoint! Thank you once again for your response. Have a great day. :) |
Hi @yxlu-0102 , Thank you so much for your kind and quick response. Your answer has been very helpful for my research. Have a great day. 👍👍 |
I have been using your open-source code to perform 16k to 48k speech reconstruction. I utilized the default 8-step inference process and tested it on the untrimmed test set using your provided checkpoint.
However, I've encountered some issues with the reconstructed speech quality. Specifically, there appears to be a significant amount of noise in the high-frequency components of the reconstructed speech. The SNR I obtained is 19.472, and the LSD is 1.212. In contrast, the results in the research paper show SNR as 24.0 and LSD as 0.92.
I suspect that the issue might be related to the inadequacy of the inference steps. Therefore, I would like to understand how to better configure the infer_steps and infer_schedule to improve the quality of the reconstructed speech. Could you please provide guidance on how to adjust these parameters to get closer to the results mentioned in the research paper?
The text was updated successfully, but these errors were encountered: