Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions on Signal-level Evaluation and DAC #45

Closed
proudpie opened this issue Nov 14, 2024 · 2 comments
Closed

questions on Signal-level Evaluation and DAC #45

proudpie opened this issue Nov 14, 2024 · 2 comments

Comments

@proudpie
Copy link

Thank you for introducing the unified metrics proposed by Codec-Superb, which is of great value to researchers. I have two questions and would greatly appreciate your assistance with them.

(1) In your paper, there are five objective metrics: STFT, MEL, STOI, PESQ and F0 Corr. However, the script src/codec_metrics/run.sh can only test four metrics: SDR, MEL, STOI and PESQ. May I ask how to obtain the STFT and F0 Corr metrics?

(2) In your experiments, you select three configurations of the DAC: D1(16k, 6kbps), D2(24k, 24kbps), and D3(44k, 8kbps). However, upon reviewing the model configurations released by DAC, I only found four options: (44k, 8kbps), (16k, 8kbps), (24k, 8kbps), and (44k, 16kbps). This has left me somewhat perplexed. Could you please clarify this discrepancy?

@ggiggit
Copy link

ggiggit commented Nov 15, 2024

You can find the STFT and F0 Corr metrics in the main branch of the repository at metrics.py.

The DAC model can encode at multiple bitrates due to the use of quantizer dropout. You can set the target bitrate by specifying the number of quantizers during inference. For example, you can set compressed_audio = self.model.compress(audio_signal, win_duration=5, n_quantizers=1) to achieve a target bitrate. For a model with 12 quantizers at 6kbps, using 1 quantizer would correspond to 0.5kbps. Regarding the (16k, 8kbps) version in the DAC release, it is a typo. The maximum bitrate for 16k is 6kbps, as mentioned in this issue, and my manual calculations confirm this. For the (24kHz, 24kbps) model, the DAC author mentioned in this issue that using all 32 quantizers results in a bitrate of 24kbps, which my manual calculations also confirm. For the (44kHz, 8kbps) model, my manual calculations confirm that the actual bitrate is 7.740kbps. Therefore, for D1, D2, and D3, you can use all quantizers. Note that the 44kHz configuration uses the (44k, 8kbps) version from the release.

I look forward to further discussions with you.

@proudpie
Copy link
Author

Thank you so much for your help! Now I fully understand the two questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants