Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

❔ Transform bark (30 Mo) output wav to mp3/webm (pydub) #288

Closed
adriens opened this issue May 13, 2023 · 9 comments
Closed

❔ Transform bark (30 Mo) output wav to mp3/webm (pydub) #288

adriens opened this issue May 13, 2023 · 9 comments

Comments

@adriens
Copy link

adriens commented May 13, 2023

❔ About

i'm currently trying to compress a 30 Mo wav output, but pydub is always complaining about "error: Size should be 1, 2, 3 or 4".

πŸ‘£ Steps to reproduce

Below is my code sample (I built the wav with bark ) :

!pip install pydub
!pip install ffmpeg-python

from pydub import AudioSegment
import ffmpeg


audio = AudioSegment.from_file('/kaggle/working/auptitcafe.wav', format='wav')
audio = audio.set_sample_width(2)
# Export the audio to MP3 format
audio.export('auptitcafe.mp3', format='mp3')

Then I get the following error message :

error                                     Traceback (most recent call last)
Cell In[22], line 9
      5 import ffmpeg
      8 audio = AudioSegment.from_file('/kaggle/working/auptitcafe.wav', format='wav')
----> 9 audio = audio.set_sample_width(2)
     10 # Export the audio to MP3 format
     11 audio.export('auptitcafe.mp3', format='mp3')

File /opt/conda/lib/python3.10/site-packages/pydub/audio_segment.py:1008, in AudioSegment.set_sample_width(self, sample_width)
   1003     return self
   1005 frame_width = self.channels * sample_width
   1007 return self._spawn(
-> 1008     audioop.lin2lin(self._data, self.sample_width, sample_width),
   1009     overrides={'sample_width': sample_width, 'frame_width': frame_width}
   1010 )

error: Size should be 1, 2, 3 or 4
@adriens adriens changed the title ❔ Transform output wav to mp3/webm (pydub) ❔ Transform (30 Mo) output wav to mp3/webm (pydub) May 13, 2023
@adriens adriens changed the title ❔ Transform (30 Mo) output wav to mp3/webm (pydub) ❔ Transform bark (30 Mo) output wav to mp3/webm (pydub) May 13, 2023
@adriens
Copy link
Author

adriens commented May 13, 2023

Is there something special with bark's wav output ?

@C0untFloyd
Copy link

C0untFloyd commented May 13, 2023

I'm unfamiliar with that module but Bark's Output Format is Mono, so perhaps try:
audio.set_channels(1)

I guess it wouldn't hurt to also specify the samplerate (which is 24000).

@adriens
Copy link
Author

adriens commented May 13, 2023

Hmmmn yes, thanks, i'm giving it a try and will let you know on this issue πŸ™

@dnrico1
Copy link

dnrico1 commented May 14, 2023

Are you perhaps adding a silence as per the longform audio generation notebook? Ran into the same error yesterday when doing that, because np.zeroes produced 64-bit integers (which are like 64-bit wav samples). The error says that audioop only expects wav samples with 1, 2, 3 or 4 bytes per samples which are equivalent to 8, 16, 24 or 32-bit wavs. Inserting a 64-bit silence in the middle messes things up.

The solution was to specify dtype 16bit int for the np.zeroes

@adriens
Copy link
Author

adriens commented May 14, 2023

Are you perhaps adding a silence as per the longform audio generation notebook? R

Yes, indeed, this is how I'm adding silents. πŸ˜…

The solution was to specify dtype 16bit int for the np.zeroes

Would you share some code snippets ❔ πŸ™

@C0untFloyd
Copy link

Not the poster above but you would just specify it in np.zeroes like this:
dtype=np.int16

@dnrico1
Copy link

dnrico1 commented May 16, 2023 via email

@adriens
Copy link
Author

adriens commented May 16, 2023

Thanks a lot to both of you @C0untFloyd and @dnrico1 , it really did the trick... making it possible to get the wav file into much much smaller files without hearable quality loss ❣️

# Properly code slience
# https://github.com/suno-ai/bark/issues/288
silence = np.zeros(int(0.25 * SAMPLE_RATE), dtype=np.int16 )
format size
.wav 20.2 MB
.mp3 1.7 MB
.webm 1.8 MB

@adriens adriens closed this as completed May 16, 2023
adriens added a commit to adriens/bark that referenced this issue May 17, 2023
... so output wav can be easily previewed on various platforms and easily compressed to mp3/webm
suno-ai#288
@adriens
Copy link
Author

adriens commented May 17, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants