AutoSub

About
Motivation
Installation
Docker
How-to example
How it works
TO-DO
Contributing
References

About

AutoSub is a CLI application to generate subtitle files (.srt, .vtt, and .txt transcript) for any video file using Mozilla DeepSpeech. I use the DeepSpeech Python API to run inference on audio segments and pyAudioAnalysis to split the initial audio on silent segments, producing multiple small files.

⭐ Featured in DeepSpeech Examples by Mozilla

Motivation

In the age of OTT platforms, there are still some who prefer to download movies/videos from YouTube/Facebook or even torrents rather than stream. I am one of them and on one such occasion, I couldn't find the subtitle file for a particular movie I had downloaded. Then the idea for AutoSub struck me and since I had worked with DeepSpeech previously, I decided to use it.

Installation

Clone the repo. All further steps should be performed while in the AutoSub/ directory
```
$ git clone https://github.com/abhirooptalasila/AutoSub
$ cd AutoSub
```

Create a pip virtual environment to install the required packages

$ python3 -m venv sub
$ source sub/bin/activate
$ pip3 install -r requirements.txt

Download the model and scorer files from DeepSpeech repo. The scorer file is optional, but it greatly improves inference results.

# Model file (~190 MB)
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
# Scorer file (~950 MB)
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer

Create two folders audio/ and output/ to store audio segments and final SRT and VTT file
```
$ mkdir audio output
```

Install FFMPEG. If you're running Ubuntu, this should work fine.

$ sudo apt-get install ffmpeg
$ ffmpeg -version               # I'm running 4.1.4

[OPTIONAL] If you would like the subtitles to be generated faster, you can use the GPU package instead. Make sure to install the appropriate CUDA version.
```
$ source sub/bin/activate
$ pip3 install deepspeech-gpu
```

Docker

Installation using Docker is pretty straight-forward.

First start by downloading training models by specifying which version you want:
- if you have your own, then skip this step and just ensure they are placed in project directory with .pbmm and .scorer extensions

$ ./getmodel.sh 0.9.3

Then for a CPU build, run:

$ docker build -t autosub .
$ docker run --volume=`pwd`/input:/input --name autosub autosub --file /input/video.mp4
$ docker cp autosub:/output/ .

For a GPU build that is reusable (saving time on instantiating the program):

$ docker build --build-arg BASEIMAGE=nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 --build-arg DEPSLIST=requirements-gpu.txt -t autosub-base . && \
docker run --gpus all --name autosub-base autosub-base --dry-run || \
docker commit --change 'CMD []' autosub-base autosub-instance

Then

$ docker run --volume=`pwd`/input:/input --name autosub autosub-instance --file video.mp4
$ docker cp autosub:/output/ .

How-to example

Make sure the model and scorer files are in the root directory. They are automatically loaded
After following the installation instructions, you can run autosub/main.py as given below. The --file argument is the video file for which SRT file is to be generated
```
$ python3 autosub/main.py --file ~/movie.mp4
```
After the script finishes, the SRT file is saved in output/
Open the video file and add this SRT file as a subtitle, or you can just drag and drop in VLC.
The optional --split-duration argument allows customization of the maximum number of seconds any given subtitle is displayed for. The default is 5 seconds
```
$ python3 autosub/main.py --file ~/movie.mp4 --split-duration 8
```
By default, AutoSub outputs in a number of formats. To only produce the file formats you want use the --format argument:
```
$ python3 autosub/main.py --file ~/movie.mp4 --format srt txt
```

How it works

Mozilla DeepSpeech is an amazing open-source speech-to-text engine with support for fine-tuning using custom datasets, external language models, exporting memory-mapped models and a lot more. You should definitely check it out for STT tasks. So, when you first run the script, I use FFMPEG to extract the audio from the video and save it in audio/. By default DeepSpeech is configured to accept 16kHz audio samples for inference, hence while extracting I make FFMPEG use 16kHz sampling rate.

Then, I use pyAudioAnalysis for silence removal - which basically takes the large audio file initially extracted, and splits it wherever silent regions are encountered, resulting in smaller audio segments which are much easier to process. I haven't used the whole library, instead I've integrated parts of it in autosub/featureExtraction.py and autosub/trainAudio.py All these audio files are stored in audio/. Then for each audio segment, I perform DeepSpeech inference on it, and write the inferred text in a SRT file. After all files are processed, the final SRT file is stored in output/.

When I tested the script on my laptop, it took about 40 minutes to generate the SRT file for a 70 minutes video file. My config is an i5 dual-core @ 2.5 Ghz and 8 gigs of RAM. Ideally, the whole process shouldn't take more than 60% of the duration of original video file.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
autosub		autosub
scripts		scripts
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
getmodel.sh		getmodel.sh
requirements-gpu.txt		requirements-gpu.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoSub

About

Motivation

Installation

Docker

How-to example

How it works

TO-DO

Contributing

References

About

Releases

Packages

Languages

License

ainy0315/AutoSub-1

Folders and files

Latest commit

History

Repository files navigation

AutoSub

About

Motivation

Installation

Docker

How-to example

How it works

TO-DO

Contributing

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages