split_audio.py

Split long speech files into chunks with given average duration in conjonction with svc (so-vits-svc-fork)(https://github.com/voicepaw/so-vits-svc-fork)

IMPORTANT NOTE: For this to work you need to be able run ffmpeg as command from within Python with the subprocess library. The command that is executed is : ffmpeg If it doesn't work for you, you'll have to figure out the path to your ffmpeg executable and modify line 24 accordingly in split_audio.py

So, I wanted to share here a tool that you might feel helpful (or not...)

Consider the following file (librivox): https://ia801401.us.archive.org/25/items/beckoningfairone_2211_librivox/beckoningfairone_08_onions_128kb.mp3

The file is 30 min and 46 sec.

In order to train a voice, the samples should be less than ~10 sec (cf Notes on https://github.com/voicepaw/so-vits-svc-fork) and typically more than one second.

WHAT DOES IT DO?

A) First, it will apply a loudness normalization to the audio, convert it to 44100 Hz, apply a high-pass filter (>80 Hz), apply a noise gate (to minimize the noise between two sentences), apply a second normalization specific to speech. Note: You can skip this with the --no_process true option

B) Then it will:

a) Trim silences. All silences > 0.5 sec will be trimmed down to 0.5 sec (default value). The silence duration as well as the threshold are adjustable but I would advise to keep the default.

b) Split the input file into audio chunks with the desired average duration (the default is 5 seconds) and put them into the output folder located in the same folder as the audio. If the folder doesn't exist it will create it. If it exists, it will delete it and recreate it. You can specify a minimal duration (default: 2 sec) and a maximal duration (default: 10 sec).

Let's take an example:

I have the above mentioned mp3 file from librivox in the TEST/ folder

python split_audio.py --desired_duration 6 -o my_chunks TEST/beckoningfairone_08_onions_128kb.mp3

Here, the histogram of the durations for the input file:

NOTES:

If the audio is not in ".wav" it will convert and save it as a ".wav" copy of the input file next to the original file
It assumes that all the relevant audio is contained in one file, say: some_long_audio.mp3 If your audio is scattered among different files, you can concatenate them using ffmpeg or in Audacity.

The resulting audio chunks will be put into the my_chunks folder next to the input file. In the terminal:

python split_audio.py --desired_duration 6 -o my_chunks TEST/beckoningfairone_08_onions_128kb.mp3

Converting to .wav ...
Done.
****************************************************************
Pre-processing ...
Done.
****************************************************************
Processing ...
****************************************************************


Input file:  TEST/beckoningfairone_08_onions_128kb.mp3
----------------------------------------------------------------
Number of audio chunks produced   :    222
Total audio duration [hh:mm:ss]   : 00:22:15
Average audio chunk duration [sec]:   6.01
Durations CFI at 95% CL [sec]     :   1.82
Max audio chunk duration [sec]    :   9.36
Min audio chunk duration [sec]    :   3.26
----------------------------------------------------------------

In this case, I asked for 6 seconds chunks (average). The original length of the input file was 30 min and 46 seconds. The resulting audio duration is 22 min 15 sec because long silences were trimmed down to 0.5 sec (the default) "Durations CFI" means, in this case, that 95% of the produced chunks have a duration between 4 and 8 seconds. (CI="Confidence Interval", CL="Confidence Level") The max duration is 9.36 sec and the min is 3.26 sec.
Note: Because there is a minimum and maximum durations specified (by default: 2 sec and 10 sec), it may happen that a bit audio of is dropped. If you want to keep absolutely everything, you can set the options: --min_duration 0 and --max_duration 9999. Also, if you whish you can skip the pre-processing stage (--no-processing option) but the results will be less good. If you don't keep the default threshold (-35 dB) and want to set your own, keep an eye on the terminal output. You might see messages telling you that some chunks have been dropped because of min_duration or max_duration. As long as it doesn't represent a significative loss, it is OK. The messages look like this:

Nb. rejected (duration < min)     :      1
Nb. rejected (duration > max)     :     10

For the options:

python split_audio.py --help
usage: split_audio.py [-h] [-o OUTPUT_FOLDER] [-m MIN_DURATION] [-l MAX_DURATION] [-d DESIRED_DURATION] [-t THRESHOLD] [-s SILENCE] [-v VERBOSE] [-k KEEP] [-n NO_PROCESSING] input_file

positional arguments:
  input_file            input wav audio file

options:
  -h, --help            show this help message and exit
  -o OUTPUT_FOLDER, --output_folder OUTPUT_FOLDER
                        name of output folder (default: processed)
  -m MIN_DURATION, --min_duration MIN_DURATION
                        min duration [sec] (default: 2)
  -l MAX_DURATION, --max_duration MAX_DURATION
                        max duration [sec] (default: 10)
  -d DESIRED_DURATION, --desired_duration DESIRED_DURATION
                        desired average duration [sec] (default: 7)
  -t THRESHOLD, --threshold THRESHOLD
                        silence threshold in dB below max (<0) (default: -35)
  -s SILENCE, --silence SILENCE
                        max silence duration for trimming [sec] (default: 0.5)
  --keep                don't remove temporary files
  --no_processing       skip PRE-processing

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
LICENSE		LICENSE
README.md		README.md
audio_split.png		audio_split.png
split_audio.py		split_audio.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

split_audio.py

About

Releases

Packages

Languages

License

sbersier/split_audio

Folders and files

Latest commit

History

Repository files navigation

split_audio.py

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages