You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for introducing the unified metrics proposed by Codec-Superb, which is of great value to researchers. I have two questions and would greatly appreciate your assistance with them.
(1) In your paper, there are five objective metrics: STFT, MEL, STOI, PESQ and F0 Corr. However, the script src/codec_metrics/run.sh can only test four metrics: SDR, MEL, STOI and PESQ. May I ask how to obtain the STFT and F0 Corr metrics?
(2) In your experiments, you select three configurations of the DAC: D1(16k, 6kbps), D2(24k, 24kbps), and D3(44k, 8kbps). However, upon reviewing the model configurations released by DAC, I only found four options: (44k, 8kbps), (16k, 8kbps), (24k, 8kbps), and (44k, 16kbps). This has left me somewhat perplexed. Could you please clarify this discrepancy?
The text was updated successfully, but these errors were encountered:
You can find the STFT and F0 Corr metrics in the main branch of the repository at metrics.py.
The DAC model can encode at multiple bitrates due to the use of quantizer dropout. You can set the target bitrate by specifying the number of quantizers during inference. For example, you can set compressed_audio = self.model.compress(audio_signal, win_duration=5, n_quantizers=1) to achieve a target bitrate. For a model with 12 quantizers at 6kbps, using 1 quantizer would correspond to 0.5kbps. Regarding the (16k, 8kbps) version in the DAC release, it is a typo. The maximum bitrate for 16k is 6kbps, as mentioned in this issue, and my manual calculations confirm this. For the (24kHz, 24kbps) model, the DAC author mentioned in this issue that using all 32 quantizers results in a bitrate of 24kbps, which my manual calculations also confirm. For the (44kHz, 8kbps) model, my manual calculations confirm that the actual bitrate is 7.740kbps. Therefore, for D1, D2, and D3, you can use all quantizers. Note that the 44kHz configuration uses the (44k, 8kbps) version from the release.
Thank you for introducing the unified metrics proposed by Codec-Superb, which is of great value to researchers. I have two questions and would greatly appreciate your assistance with them.
(1) In your paper, there are five objective metrics: STFT, MEL, STOI, PESQ and F0 Corr. However, the script src/codec_metrics/run.sh can only test four metrics: SDR, MEL, STOI and PESQ. May I ask how to obtain the STFT and F0 Corr metrics?
(2) In your experiments, you select three configurations of the DAC: D1(16k, 6kbps), D2(24k, 24kbps), and D3(44k, 8kbps). However, upon reviewing the model configurations released by DAC, I only found four options: (44k, 8kbps), (16k, 8kbps), (24k, 8kbps), and (44k, 16kbps). This has left me somewhat perplexed. Could you please clarify this discrepancy?
The text was updated successfully, but these errors were encountered: