Mixing in feature space #749

desh2608 · 2022-06-15T23:19:00Z

desh2608
Jun 15, 2022
Collaborator

When mixing audios at the feature level, each feature type defines a mix() function. For the TorchaudioFbank feature, this is defined here and is given as:

    def mix(
        features_a: np.ndarray, features_b: np.ndarray, energy_scaling_factor_b: float
    ) -> np.ndarray:
        return np.log(
            np.maximum(
                # protection against log(0); max with EPSILON is adequate since these are energies (always >= 0)
                EPSILON,
                np.exp(features_a) + energy_scaling_factor_b * np.exp(features_b),
            )
        )

Is there some analysis about how accurate this mixing is, compared with mixing the raw audio and then performing extraction? STFT is linear and the log-exp addition takes care of the log part, but wouldn't the Mel-scale filters make some difference? Does anyone know of any papers about this?

pzelasko · 2022-06-15T23:25:07Z

pzelasko
Jun 15, 2022
Maintainer

I don’t recall seeing any analysis of this. I remember checking it in some ASR experiment and the difference was very small, about 1% relative to the WER. Saurabh tried it on speech enhancement and the difference was more pronounced when evaluated on EER in speaker ID task, maybe 5% relative. More detailed analysis would be welcome.

3 replies

pzelasko Jun 15, 2022
Maintainer

In ASR I don’t remember which was the “winner”, but in enhancement + speaker ID the audio domain mixing was better.

pzelasko Jun 15, 2022
Maintainer

See here for spectrograms analysis (beginnings of Lhotse) #19 (comment)

desh2608 Jun 15, 2022
Collaborator Author

Thanks for the pointer! The discussions on that thread are very enlightening.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixing in feature space #749

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Mixing in feature space #749

desh2608 Jun 15, 2022 Collaborator

Replies: 1 comment · 3 replies

pzelasko Jun 15, 2022 Maintainer

pzelasko Jun 15, 2022 Maintainer

pzelasko Jun 15, 2022 Maintainer

desh2608 Jun 15, 2022 Collaborator Author

desh2608
Jun 15, 2022
Collaborator

Replies: 1 comment 3 replies

pzelasko
Jun 15, 2022
Maintainer

pzelasko Jun 15, 2022
Maintainer

pzelasko Jun 15, 2022
Maintainer

desh2608 Jun 15, 2022
Collaborator Author