collate_multi_channel_audio #552

m-wiesner · 2022-01-26T18:55:32Z

I think there is a problem in collate_multi_channel_audio

def collate_multi_channel_audio(cuts: CutSet) -> torch.Tensor:
    """
    Load audio samples for all the cuts and return them as a batch in a torch tensor.
    The cuts have to be of type ``MixedCut`` and their tracks will be interpreted as individual channels.
    The output shape is ``(batch, channel, time)``.
    The cuts will be padded with silence if necessary.
    """
    assert all(cut.has_recording for cut in cuts)
    assert all(isinstance(cut, MixedCut) for cut in cuts)
    cuts = maybe_pad(cuts)
    first_cut = next(iter(cuts))
    audio = torch.empty(len(cuts), len(first_cut.tracks), first_cut.num_samples)
    for idx, cut in enumerate(cuts):
        audio[idx] = torch.from_numpy(cut.load_audio())
    return audio

the output tensor is initialized here

audio = torch.empty(len(cuts), len(first_cut.tracks), first_cut.num_samples)

and then inside the subsequent for loop
cut.load_audio() uses the flag mix=True, by default so it returns a tensor of size (1 x cut.num_samples) instead of a tensor of size (1 x len(first_cut.tracks) x cut.num_samples). This means the multichannel track is mixed down by default and the values in audio[:, 1:, :] are not ever set and can be arbitrary values.

The mix flag should probably be passed as an option to collate_multi_channel_audio, or otherwise it should be updated to return a tensor of size (len(cuts), first_cut.num_samples), where the mix should happen automatically, and the doc string should be updated to reflect this.

The text was updated successfully, but these errors were encountered:

pzelasko · 2022-01-26T20:23:16Z

Good point. This function is actually not very well supported in Lhotse right now -- please refer to the discussion in #532. If you're open to doing some work to extend the multi-channel support in Lhotse, I'd love to help with that.

m-wiesner · 2022-01-27T22:40:00Z

There is another small related problem I have noticed ...

The function mix_cuts() in cut.py is supposed to return cuts of type MixedCuts. The docstring says """Return a MixedCut that consists of the input Cuts mixed with each other as-is."""

In some cases, there are CutSets intended to represent multichannel audio for which a small number of recordings, for whatever reason, only have a single channel. In these cases the function applied in the functools.reduce operator will not be applied to the first (and only element). Currently the function is the mix() method, which among other things, casts cuts to MixedCuts.

As a result, a single Channel MonoCut recording will not be cast to a MixedCut. This problem also affects mix_same_recording_channels(), which will return a CutSet, that has some MonoCuts as well as Mixed cuts, when I think the intention was for it to only return MixedCuts. A similar problem is present in the MixedCut truncate method, which has a special case for when there is a single channel, and returns a MonoCut, which is effectively casting the MixedCut to a MonoCut, which I also don't think is the desired behavior, but perhaps this was intended ... I assume a similar problem may also affect other methods, but these are the only two I have found so far.

I have fixed this by adding a static function to MixedCuts

@staticfunction
def from_mono(cut: MonoCut) -> MixedCut:
       return MixedCut(id=cut.id, tracks=[MixTrack(cut=cut)])

and then changing mix_cuts from

from

return reduce(mix, cuts)
to

return MixedCut.from_mono(next(iter(cuts))) if len(cuts) == 1 else reduce(mix, cuts)

I also added this into the MixedCut truncate method


if len(new_tracks) == 1:                                                                                                               
            # The truncation resulted in just a single cut - simply return it.                                                              
            return MixedCut.from_mono(new_tracks[0].cut)

I can submit a pull request if this seems fine, but I think this is related more generally to the issue of how to support MultiChannel audio.

desh2608 · 2022-11-23T19:58:18Z

I think we can close this since MultiCut is now supported as its own class. Feel free to re-open if you think this is still an issue.

desh2608 closed this as completed Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

collate_multi_channel_audio #552

collate_multi_channel_audio #552

m-wiesner commented Jan 26, 2022

pzelasko commented Jan 26, 2022

m-wiesner commented Jan 27, 2022 •

edited

Loading

desh2608 commented Nov 23, 2022

collate_multi_channel_audio #552

collate_multi_channel_audio #552

Comments

m-wiesner commented Jan 26, 2022

pzelasko commented Jan 26, 2022

m-wiesner commented Jan 27, 2022 • edited Loading

desh2608 commented Nov 23, 2022

m-wiesner commented Jan 27, 2022 •

edited

Loading