-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
collate_multi_channel_audio #552
Comments
Good point. This function is actually not very well supported in Lhotse right now -- please refer to the discussion in #532. If you're open to doing some work to extend the multi-channel support in Lhotse, I'd love to help with that. |
There is another small related problem I have noticed ... The function mix_cuts() in cut.py is supposed to return cuts of type MixedCuts. The docstring says """Return a MixedCut that consists of the input Cuts mixed with each other as-is.""" In some cases, there are CutSets intended to represent multichannel audio for which a small number of recordings, for whatever reason, only have a single channel. In these cases the function applied in the functools.reduce operator will not be applied to the first (and only element). Currently the function is the mix() method, which among other things, casts cuts to MixedCuts. As a result, a single Channel MonoCut recording will not be cast to a MixedCut. This problem also affects mix_same_recording_channels(), which will return a CutSet, that has some MonoCuts as well as Mixed cuts, when I think the intention was for it to only return MixedCuts. A similar problem is present in the MixedCut truncate method, which has a special case for when there is a single channel, and returns a MonoCut, which is effectively casting the MixedCut to a MonoCut, which I also don't think is the desired behavior, but perhaps this was intended ... I assume a similar problem may also affect other methods, but these are the only two I have found so far. I have fixed this by adding a static function to MixedCuts
and then changing mix_cuts from from
I also added this into the MixedCut truncate method
I can submit a pull request if this seems fine, but I think this is related more generally to the issue of how to support MultiChannel audio. |
I think we can close this since |
I think there is a problem in collate_multi_channel_audio
the output tensor is initialized here
audio = torch.empty(len(cuts), len(first_cut.tracks), first_cut.num_samples)
and then inside the subsequent for loop
cut.load_audio() uses the flag mix=True, by default so it returns a tensor of size (1 x cut.num_samples) instead of a tensor of size (1 x len(first_cut.tracks) x cut.num_samples). This means the multichannel track is mixed down by default and the values in audio[:, 1:, :] are not ever set and can be arbitrary values.
The mix flag should probably be passed as an option to collate_multi_channel_audio, or otherwise it should be updated to return a tensor of size (len(cuts), first_cut.num_samples), where the mix should happen automatically, and the doc string should be updated to reflect this.
The text was updated successfully, but these errors were encountered: