-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
from mel_spectrogram to wav again #10
Comments
The same question I want to ask,too. In my case, use librosa.feature.melspectrogram and then to compute librosa.feature.mfcc is not equal with kaldi's process. BTW, did you find the way to re-build audio? |
Hi, |
thank you for your kind reply I will looking for it |
I spent a few hours yesterday for this. This is what I finally settled upon at least for now. Sorry for the delay in sharing this. recov = librosa.feature.inverse.mel_to_audio (M=warped_masked_spectrogram,
hop_length=128, sr=sampling_rate) and use this function to save it def save_wav (wav, path):
wav *= 32767 / max (0.01, np.max(np.abs(wav)))
scipy.io.wavfile.write (path, 16000, wav.astype(np.int16)) |
Hi Roxima, |
@kimchi88 Great. Looking forward to your results. |
confirmed! It works perfectly.. next step will be use the augmented audio to improve ASR. thanks for help! |
@darisettysuneel As much as I can remember it finishes very quickly. What takes time was augmentation and not saving resulting audio. I'll try to report back to you with a simple benchmark. |
Hi @roxima Any statistics can I get? |
@roxima Hi, I waste more time when convert mel_spectrogram to wav than augment the wav. Do you have any better solution? Thanks |
@darisettysuneel @Lomax314 So sorry for being late, was as busy as a bee.
As can be seen, reconstructing audio takes much more time compared with augmentations. However I noticed that running this script uses more than 8Gb of my OS drive free space, maybe there is a IO bottleneck?! Running this I get only 141Mb free space. |
Previous one used librosa 0.7.0RC1 and this is for the latest 0.7.0 release:
One more
|
@roxima Thanks for sharing the statistics! May I know the length of the audio files for provided results. |
@darisettysuneel Your're welcome. Exactly 2s970ms |
@roxima For me it is taking ~1.5 minutes for 8-10sec audio. I need to take a look at input data to reconstruction function. Once again thanks. |
@roxima Very thanks for ur reply! the function of the librosa takes much time for me so that i wish i can find other solution. Once again thanks. |
@darisettysuneel @Lomax314 : Did you find any other better method to achieve it? |
@AASHISHAG I'm sorry about that the answer is NO.However,this method seemd to be implemented in function of the kaldi'repository |
@Lomax314 : Thank you for the reply. I will have a look. If you still have the setup running, could you please help me with the
|
It takes me 10 minutes for 10 sec audio for me, the machine have 88 cores with 500GB memory, I use the last code to convert to audio, do you have any better solution? maybe with torch audio? thanks. |
@juunnn : Could you please confirm your tensorflow and gcc version? I am facing some dependency issue. I think it has to do with tensorflow and gcc. This will list all the versions. |
I still have problem with tf dependenci, that's why I use pytorch for them. It works, and don't have a long time to execute, but for some audio it says "output have no finite value everywhere" while compiling back to audio. I dont know what to do, |
@juunnn : Could you please share your code, that you wrote with PyTorch dependencies. I don't have exposure to either PyTorch or Tensorflow. It would be really helpful. I am using the below code and facing dependencies issues.
|
it indeed takes a lot of time to convert from mel_spectogram to audio, if someone gets across a faster way instead of librosa built in please share. For a 1 minute audio with 128 mels
|
Any new updates for possibly faster implementations? |
Hi,
Do you have any suggestion about how to re-build the audio file after augmentation?
The text was updated successfully, but these errors were encountered: