feat: Using torchaudio for IO #492

mogwai · 2020-11-09T11:15:31Z

Closes #484

mogwai · 2020-11-09T11:54:23Z

Looks like the inference needs refactoring as well

mogwai · 2020-11-09T12:12:22Z

Running the tests I'm getting when the inference notebook is being run.

ValueError                                Traceback (most recent call last)

    ~/miniconda3/envs/v2/lib/python3.8/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
    --> 345                 return method()
    346             return None
    347         else:

    ~/miniconda3/envs/v2/lib/python3.8/site-packages/pyannote/core/feature.py in _repr_png_(self)
    238         from .notebook import repr_feature
    239 
    --> 240         return repr_feature(self)
    241 
    242     _HANDLED_TYPES = (np.ndarray, numbers.Number)

    ~/miniconda3/envs/v2/lib/python3.8/site-packages/pyannote/core/notebook.py in repr_feature(feature)
    321     plt.rcParams['figure.figsize'] = (notebook.width, 2)
    322     fig, ax = plt.subplots()
    --> 323     notebook.plot_feature(feature, ax=ax)
    324     data = print_figure(fig, 'png')
    325     plt.close(fig)

    ~/miniconda3/envs/v2/lib/python3.8/site-packages/pyannote/core/notebook.py in plot_feature(self, feature, ax, time, ylim)
    264 
    265         if ylim is None:
    --> 266             m = np.nanmin(data)
    267             M = np.nanmax(data)
    268             ylim = (m - 0.1 * (M - m), M + 0.1 * (M - m))

    <__array_function__ internals> in nanmin(*args, **kwargs)

    ~/miniconda3/envs/v2/lib/python3.8/site-packages/numpy/lib/nanfunctions.py in nanmin(a, axis, out, keepdims)
    317         # Fast, but not safe for subclasses of ndarray, or object arrays,
    318         # which do not implement isnan (gh-9009), or fmin correctly (gh-8975)
    --> 319         res = np.fmin.reduce(a, axis=axis, out=out, **kwargs)
    320         if np.isnan(res).any():
    321             warnings.warn("All-NaN slice encountered", RuntimeWarning,

    ValueError: zero-size array to reduction operation fmin which has no identity

hbredin · 2020-11-09T12:15:21Z

Looks like the data output by inference contains lots of NaNs so pyannote.core cannot display it.
inference should be fixed first :)

…dio into feature/torchaudio-io

hbredin

I reviewed everything but inference.

Why? It is late and I need to sleep but also I would like to know your thoughts on updating the shape of model output from (num_frames, num_classes) to (num_classes, num_frames) to be coherent with the new (num_channels, num_samples) input shape.

pyannote/audio/core/io.py

pyannote/audio/core/model.py

tests/io_tests.py

Co-authored-by: Hervé BREDIN <[email protected]>

pyannote/audio/core/io.py

hbredin · 2020-11-10T14:26:17Z

pyannote/audio/core/io.py

    @staticmethod
-    def normalize(waveform: np.ndarray) -> np.ndarray:
-        return waveform / (np.sqrt(np.mean(waveform ** 2)) + 1e-8)
+    def normalize(waveform: Tensor) -> Tensor:
+        """
+
+        Parameters
+        ----------
+        waveform : (channel, time) Tensor
+            Single or multichannel waveform
+
+
+        Returns
+        -------
+        waveform: (channel, time) Tensor
+        """
+
+        means = waveform.mean(dim=1)[:, None]
+        stds = waveform.std(dim=1)[:, None]
+        return (waveform - means) / (stds + 1e-8)


Actually, this is not the same. normalize is not meant as standardize (mean substraction and putting variance to one).

It is meant as normalizing the power of the signal so that we can later control the signal-to-noise ratio when summing two chunks.

I see, I think torchaudio does that out of the box? https://github.com/pytorch/audio/blob/3b9e93372dd48649624ac2bbf660bb2e3384820e/torchaudio/backend/_soundfile_backend.py#L93

This is not the same thing. My understanding is that torchaudio divides waveform by max(abs(waveform)) so that it lies in [-1., 1.] interval. My normalize computes a different thing.

pyannote/audio/core/io.py

pyannote/audio/core/inference.py

pyannote/audio/core/io.py

hbredin · 2020-11-10T20:00:38Z

🍾 🥳 Thanks!

hbredin · 2020-11-11T11:06:16Z

FYI, there is a out-of-bound bug in Inference due to some kind of rounding errors.
Working on it: I am not sure whether this has been introduced by this PR.

mogwai and others added 9 commits November 2, 2020 14:30

Switch to torchaudio

baa5527

Merge branch 'develop' into feature/torchaudio-io

501980c

fix: make example_input_array follow "bct" convention

598c032

Merge branch 'develop' into feature/torchaudio-io

18ac846

Added test to check segment shapes

02ff04f

Merge branch 'develop' into feature/torchaudio-io

4d68972

Attempted to fix introspect

941c91c

Fixed introspection

a23fa1b

Removing breakpoints

aff1dc8

hbredin closed this Nov 9, 2020

hbredin deleted the branch pyannote:develop November 9, 2020 12:39

hbredin reopened this Nov 9, 2020

hbredin changed the base branch from v2 to develop November 9, 2020 12:46

hbredin added the v2 label Nov 9, 2020

Merge branch 'develop' into feature/torchaudio-io

b3f5612

mogwai added 3 commits November 9, 2020 13:21

Inference working with new torchaudio io

55804cf

Merge branch 'feature/torchaudio-io' of github.com:mogwai/pyannote-au…

2d53b1c

…dio into feature/torchaudio-io

Merge branch 'develop' into feature/torchaudio-io

b7458a4

mogwai marked this pull request as ready for review November 9, 2020 13:27

Normalizing works with multichannel

6c9e32e

mogwai requested a review from hbredin November 9, 2020 17:50

hbredin requested changes Nov 9, 2020

View reviewed changes

mogwai and others added 3 commits November 10, 2020 09:46

Update pyannote/audio/core/io.py

07d30fd

Co-authored-by: Hervé BREDIN <[email protected]>

Update pyannote/audio/core/io.py

07e25b3

Co-authored-by: Hervé BREDIN <[email protected]>

Update pyannote/audio/core/io.py

1871349

Co-authored-by: Hervé BREDIN <[email protected]>

mogwai and others added 7 commits November 10, 2020 10:32

Using round instead of int

94b57fe

Co-authored-by: Hervé BREDIN <[email protected]>

Supporting fixed duration argument

ae15171

Shape of tensor comment correction

3076e6f

Co-authored-by: Hervé BREDIN <[email protected]>

Removing redundant assignments

13123b2

Fixed argument checked

f42f30c

Merge branch 'develop' into feature/torchaudio-io

f8eb5ce

Prepare chunk introspecting correct dimension

cf5397c

mogwai commented Nov 10, 2020

View reviewed changes

pyannote/audio/core/io.py Show resolved Hide resolved

mogwai mentioned this pull request Nov 10, 2020

Switch from soundfile to torchaudio #360

Closed

6 tasks

This was linked to issues Nov 10, 2020

Switch from soundfile to torchaudio #360

Closed

b c t vs. b t c? #481

Closed

Switch to torchaudio for IO #484

Closed

hbredin self-requested a review November 10, 2020 14:21

hbredin requested changes Nov 10, 2020

View reviewed changes

Requires grad set to false + waveform reduntant initialization

a7d09b7

hbredin reviewed Nov 10, 2020

View reviewed changes

pyannote/audio/core/io.py Outdated Show resolved Hide resolved

Update pyannote/audio/core/io.py

227a7d0

hbredin previously approved these changes Nov 10, 2020

View reviewed changes

Refactored normalize -> power_normalize

6c17ee3

mogwai dismissed hbredin’s stale review via 6c17ee3 November 10, 2020 17:22

Merge branch 'develop' into feature/torchaudio-io

592e988

hbredin approved these changes Nov 10, 2020

View reviewed changes

hbredin merged commit 3ffd31a into pyannote:develop Nov 10, 2020

mogwai deleted the feature/torchaudio-io branch March 23, 2021 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Using torchaudio for IO #492

feat: Using torchaudio for IO #492

mogwai commented Nov 9, 2020

mogwai commented Nov 9, 2020

mogwai commented Nov 9, 2020 •

edited

Loading

hbredin commented Nov 9, 2020

hbredin left a comment

hbredin Nov 10, 2020

mogwai Nov 10, 2020

hbredin Nov 10, 2020

hbredin commented Nov 10, 2020

hbredin commented Nov 11, 2020

feat: Using torchaudio for IO #492

feat: Using torchaudio for IO #492

Conversation

mogwai commented Nov 9, 2020

mogwai commented Nov 9, 2020

mogwai commented Nov 9, 2020 • edited Loading

hbredin commented Nov 9, 2020

hbredin left a comment

Choose a reason for hiding this comment

hbredin Nov 10, 2020

Choose a reason for hiding this comment

mogwai Nov 10, 2020

Choose a reason for hiding this comment

hbredin Nov 10, 2020

Choose a reason for hiding this comment

hbredin commented Nov 10, 2020

hbredin commented Nov 11, 2020

mogwai commented Nov 9, 2020 •

edited

Loading