Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Using torchaudio for IO #492

Merged
merged 32 commits into from
Nov 10, 2020
Merged

feat: Using torchaudio for IO #492

merged 32 commits into from
Nov 10, 2020

Conversation

mogwai
Copy link
Contributor

@mogwai mogwai commented Nov 9, 2020

Closes #484

@mogwai
Copy link
Contributor Author

mogwai commented Nov 9, 2020

Looks like the inference needs refactoring as well

@mogwai
Copy link
Contributor Author

mogwai commented Nov 9, 2020

Running the tests I'm getting when the inference notebook is being run.

ValueError                                Traceback (most recent call last)

    ~/miniconda3/envs/v2/lib/python3.8/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
    --> 345                 return method()
    346             return None
    347         else:

    ~/miniconda3/envs/v2/lib/python3.8/site-packages/pyannote/core/feature.py in _repr_png_(self)
    238         from .notebook import repr_feature
    239 
    --> 240         return repr_feature(self)
    241 
    242     _HANDLED_TYPES = (np.ndarray, numbers.Number)

    ~/miniconda3/envs/v2/lib/python3.8/site-packages/pyannote/core/notebook.py in repr_feature(feature)
    321     plt.rcParams['figure.figsize'] = (notebook.width, 2)
    322     fig, ax = plt.subplots()
    --> 323     notebook.plot_feature(feature, ax=ax)
    324     data = print_figure(fig, 'png')
    325     plt.close(fig)

    ~/miniconda3/envs/v2/lib/python3.8/site-packages/pyannote/core/notebook.py in plot_feature(self, feature, ax, time, ylim)
    264 
    265         if ylim is None:
    --> 266             m = np.nanmin(data)
    267             M = np.nanmax(data)
    268             ylim = (m - 0.1 * (M - m), M + 0.1 * (M - m))

    <__array_function__ internals> in nanmin(*args, **kwargs)

    ~/miniconda3/envs/v2/lib/python3.8/site-packages/numpy/lib/nanfunctions.py in nanmin(a, axis, out, keepdims)
    317         # Fast, but not safe for subclasses of ndarray, or object arrays,
    318         # which do not implement isnan (gh-9009), or fmin correctly (gh-8975)
    --> 319         res = np.fmin.reduce(a, axis=axis, out=out, **kwargs)
    320         if np.isnan(res).any():
    321             warnings.warn("All-NaN slice encountered", RuntimeWarning,

    ValueError: zero-size array to reduction operation fmin which has no identity

@hbredin
Copy link
Member

hbredin commented Nov 9, 2020

Looks like the data output by inference contains lots of NaNs so pyannote.core cannot display it.
inference should be fixed first :)

@hbredin hbredin closed this Nov 9, 2020
@hbredin hbredin deleted the branch pyannote:develop November 9, 2020 12:39
@hbredin hbredin reopened this Nov 9, 2020
@hbredin hbredin changed the base branch from v2 to develop November 9, 2020 12:46
@hbredin hbredin added the v2 label Nov 9, 2020
@mogwai mogwai marked this pull request as ready for review November 9, 2020 13:27
@mogwai mogwai requested a review from hbredin November 9, 2020 17:50
Copy link
Member

@hbredin hbredin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed everything but inference.

Why? It is late and I need to sleep but also I would like to know your thoughts on updating the shape of model output from (num_frames, num_classes) to (num_classes, num_frames) to be coherent with the new (num_channels, num_samples) input shape.

pyannote/audio/core/io.py Outdated Show resolved Hide resolved
pyannote/audio/core/io.py Outdated Show resolved Hide resolved
pyannote/audio/core/io.py Outdated Show resolved Hide resolved
pyannote/audio/core/io.py Outdated Show resolved Hide resolved
pyannote/audio/core/io.py Show resolved Hide resolved
pyannote/audio/core/io.py Show resolved Hide resolved
pyannote/audio/core/io.py Outdated Show resolved Hide resolved
pyannote/audio/core/model.py Show resolved Hide resolved
tests/io_tests.py Outdated Show resolved Hide resolved
tests/io_tests.py Outdated Show resolved Hide resolved
mogwai and others added 3 commits November 10, 2020 09:46
Comment on lines 93 to 110
@staticmethod
def normalize(waveform: np.ndarray) -> np.ndarray:
return waveform / (np.sqrt(np.mean(waveform ** 2)) + 1e-8)
def normalize(waveform: Tensor) -> Tensor:
"""

Parameters
----------
waveform : (channel, time) Tensor
Single or multichannel waveform


Returns
-------
waveform: (channel, time) Tensor
"""

means = waveform.mean(dim=1)[:, None]
stds = waveform.std(dim=1)[:, None]
return (waveform - means) / (stds + 1e-8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is not the same. normalize is not meant as standardize (mean substraction and putting variance to one).

It is meant as normalizing the power of the signal so that we can later control the signal-to-noise ratio when summing two chunks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same thing. My understanding is that torchaudio divides waveform by max(abs(waveform)) so that it lies in [-1., 1.] interval. My normalize computes a different thing.

pyannote/audio/core/io.py Outdated Show resolved Hide resolved
pyannote/audio/core/inference.py Outdated Show resolved Hide resolved
pyannote/audio/core/inference.py Show resolved Hide resolved
pyannote/audio/core/io.py Outdated Show resolved Hide resolved
hbredin
hbredin previously approved these changes Nov 10, 2020
@hbredin hbredin merged commit 3ffd31a into pyannote:develop Nov 10, 2020
@hbredin
Copy link
Member

hbredin commented Nov 10, 2020

🍾 🥳 Thanks!

@hbredin
Copy link
Member

hbredin commented Nov 11, 2020

FYI, there is a out-of-bound bug in Inference due to some kind of rounding errors.
Working on it: I am not sure whether this has been introduced by this PR.

@mogwai mogwai deleted the feature/torchaudio-io branch March 23, 2021 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch to torchaudio for IO b c t vs. b t c? Switch from soundfile to torchaudio
2 participants