-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vocal tract length perturbation #139
Conversation
Thanks for the pull request :) I'll find time to review this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! I will have another look at this soon, but I've left some initial comments.
- I also want to add a mono voice recording in the demo, so I can try it on that.
I understand that this transform does not support stereo/multichannel audio yet. Is that right? Would it be hard to add support for that? The other transforms in torch-audiomentation support multichannel audio
I added VTLP to the demo script and added a speech example there: I listened to the outputs, and it very much resembles a band stop filter. Is that what you intended? I've attached the sounds in this zip: And I have another question: is this a different technique? https://www.isca-speech.org/archive/pdfs/interspeech_2019/kim19_interspeech.pdf |
class VTLP(BaseWaveformTransform): | ||
""" | ||
Apply Vocal Tract Length Perturbation as defined in | ||
http://www.cs.toronto.edu/~hinton/absps/perturb.pdf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if you could explain a bit more what this transform actually does, in an "explain it like I'm five" fashion, so that the average developer (including me) can understand
I'm trying to learn about what Vocal Tract Length Perturbation is. Reading the paper, I get the idea that it's about frequency warping
But I don't see frequencies getting re-mapped in the spectrogram gif I posted above I also found this video online: https://www.youtube.com/watch?v=vCDnfUM6gn8 Could you try to enlighten me? Do you have some reference examples of what VTLP should sound like? Am I missing something, or is there a bug in your implementation? |
I wasn't aware of this paper, I will read it and get back to you
I think maybe there is an issue with my implementation. I will check it out and comment here (probably on the weekend) |
Closing for inactivity. Feel free to suggest a reopen later if the work gets picked up again |
Implemented VLTP as introduced in http://www.cs.toronto.edu/~hinton/absps/perturb.pdf. Adopted from the numpy code here: https://github.com/makcedward/nlpaug/blob/master/nlpaug/model/audio/vtlp.py
Additional notes:
Fixes #115