Infer_step selection #18

yxlu-0102 · 2023-10-20T02:14:48Z

I have been using your open-source code to perform 16k to 48k speech reconstruction. I utilized the default 8-step inference process and tested it on the untrimmed test set using your provided checkpoint.

However, I've encountered some issues with the reconstructed speech quality. Specifically, there appears to be a significant amount of noise in the high-frequency components of the reconstructed speech. The SNR I obtained is 19.472, and the LSD is 1.212. In contrast, the results in the research paper show SNR as 24.0 and LSD as 0.92.

I suspect that the issue might be related to the inadequacy of the inference steps. Therefore, I would like to understand how to better configure the infer_steps and infer_schedule to improve the quality of the reconstructed speech. Could you please provide guidance on how to adjust these parameters to get closer to the results mentioned in the research paper?

jjunak-yun · 2024-07-10T05:30:04Z

Hello, @yxlu-0102 !

When I executed the 8-step inference process like you did, I noticed significant noise in the high-frequency range of the reconstructed speech. How many steps of the inference process did you perform to eliminate the noise or to achieve results similar to those in the paper?

I sincerely appreciate your help and hope that your advice will lead to better results.

Have a good day :)

yxlu-0102 · 2024-07-10T08:08:58Z

Hello, @yxlu-0102 !

When I executed the 8-step inference process like you did, I noticed significant noise in the high-frequency range of the reconstructed speech. How many steps of the inference process did you perform to eliminate the noise or to achieve results similar to those in the paper?

I sincerely appreciate your help and hope that your advice will lead to better results.

Have a good day :)

Hi,

I used the checkpoint and inference codes of nu-wave2 provided by the author of the UDM+ in their repository.
The performance was much better.

jjunak-yun · 2024-07-11T01:13:34Z

Hello, @yxlu-0102 !

Thank you so much for your quick and valuable response. I believe that by experimenting with the checkpoint you provided, I might achieve better results. 😊

Did you still set the inference steps to 8 with the new checkpoint?

In my case, when I set the inference steps to 8 with the new checkpoint, there is noise. When I increase the inference steps to over 50, the sound quality improves, but the LSD value exceeds 2, indicating a tendency towards excessive denoising. 🥲

I'm curious to know the number of inference steps you used with the new checkpoint!

Thank you once again for your response. Have a great day. :)

yxlu-0102 · 2024-07-11T02:27:47Z

Hi @naknak-Yun,

I set the inference step to 50, and below are the results I reproduced:

jjunak-yun · 2024-07-11T07:22:23Z

Hi @yxlu-0102 ,

Thank you so much for your kind and quick response. Your answer has been very helpful for my research. Have a great day. 👍👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infer_step selection #18

Infer_step selection #18

yxlu-0102 commented Oct 20, 2023

jjunak-yun commented Jul 10, 2024

yxlu-0102 commented Jul 10, 2024

jjunak-yun commented Jul 11, 2024

yxlu-0102 commented Jul 11, 2024

jjunak-yun commented Jul 11, 2024

Infer_step selection #18

Infer_step selection #18

Comments

yxlu-0102 commented Oct 20, 2023

jjunak-yun commented Jul 10, 2024

yxlu-0102 commented Jul 10, 2024

jjunak-yun commented Jul 11, 2024

yxlu-0102 commented Jul 11, 2024

jjunak-yun commented Jul 11, 2024