Sundry tools for maintaining a personal media library. Rabbits (hares) like to be clean and are frequently grooming. The tools here are intended to groom your media files for various purposes such as:
- Transcode to storage optimized codecs
- Cut commercials from DVR recordings
- Profanity filtering
- Be nice to low powered (compute, iops) machines
- Tested with ATSC 1.0, DVD region 1 and Blu-Ray region 1. Should work with anything ffmpeg supports, but might need
changes. Create an issue if you have problems. Include the output of
ffprobe
on the file.
This repository is a work in progress! I wrote it for my personal needs and thought others may benefit. Over time, I want to make it configurable to apply to a wider audience. Pull requests are welcome! If you want to change something like a codec or directory path, please make it configurable.
Assumptions:
- Target codecs are H.265, H.264 (video) and Opus (audio). Text subtitles are SRT or ASS. Standard DVD and Blu-ray subtitles are kept.
- Container is MKV. It supports multiple audio and subtitle streams and custom tags. This would be difficult to change.
- Dropbox is my "backup" solution. Code doesn't depend on that but some choices are made because of it. For example, temp files start with ".~" because Dropbox ignores them.
- My media is served by Plex. Code generally does not depend on Plex, but supports interacting with it. I'm open to expanding support to other media servers.
The tools are intended to be run in a Docker container because there is specific software that needs to be installed and configured. On some architectures it's difficult or may conflict with other software. So I use a Docker container.
If people want to support running outside of Docker, I'll take PRs for that, but I can't do much with support since I won't be using it this way.
I just want to run it ...
Create a config file specific to your setup in your home directory as .media-hare.ini
(notice the leading period
because POSIX people like that). You could also place in /etc/media-hare.ini
if you'd like. You only need to add
options that are different from media-hare.defaults.ini.
If you're running Plex on Docker, running is easier.
Consider running the (plex-hare)[https://github.com/double16/plex-hare] container. There are examples for using
Docker Compose. It integrates with media-hare
and does process priority tuning.
Otherwise, run something similar to the following, replacing time zone and paths as necessary.
$ docker run -d -e "TZ=America/Chicago" --device /dev/dri --device /dev/nvidiactl --device /dev/nvidia0 --device /dev/nvidia-uvm -v /path/to/media:/media -v /path/to/media-hare.ini:/etc/media-hare.ini ghcr.io/double16/media-hare:main
Install Docker for Desktop from https://www.docker.com/products/docker-desktop/. Most of the defaults should be fine except that you likely want to increase the number of CPUs to 1 or 2 less than the maximum available. Transcoding is a compute heavy operation.
Open a shell like Terminal or Powershell and run the following, replacing time zone and paths as necessary.
$ docker run -d -e "TZ=America/Chicago" --device /dev/dri --device /dev/nvidiactl --device /dev/nvidia0 --device /dev/nvidia-uvm -v /path/to/media:/media -v /path/to/media-hare.ini:/etc/media-hare.ini ghcr.io/double16/media-hare:main
The audio transcriber service alphacep/kaldi-
needs a service per language. It's necessary to name the service
kaldi-{lang}
for media-hare
to find it. If no service exists, the transcriber will be run and stopped as needed.
This is inefficient especially for memory usage.
The LibreOffice language tool is used for spell checking and generating subtitles from audio. Only one is needed. The
two environment variables LANGUAGE_TOOL_HOST
and LANGUAGE_TOOL_PORT
are used to configure it.
services:
kaldi-en:
image: alphacep/kaldi-en:latest
restart: unless-stopped
environment:
- "VOSK_SHOW_WORDS=true"
- "VOSK_ALTERNATIVES=0"
ports:
- "2700:2700/tcp"
langtool:
image: ghcr.io/double16/libreoffice-langtool:main
restart: unless-stopped
ports:
- "8100:8100/tcp"
media-hare:
image: ghcr.io/double16/media-hare:main
restart: unless-stopped
environment:
- "TZ=America/Chicago"
- "LANGUAGE_TOOL_HOST=langtool"
- "LANGUAGE_TOOL_PORT=8100"
You'll need to place media-hare.ini
into your workspace directory.
- Mount media folder to /path/to/media
- Mount your source folder to ~/Workspace
Performance is better using language and vosk servers running on docker. Otherwise media-hare will start them in each process, using unnecessary memory and compute.
docker run -d --name kaldi-en -e VOSK_SHOW_WORDS=true -e VOSK_ALTERNATIVES=0 -p 2700:2700 alphacep/kaldi-en:latest
docker run -d --name langtool -p 8100:8100 ghcr.io/double16/libreoffice-langtool:main
export LANGUAGE_TOOL_HOST=localhost
export KALDI_EN_HOST=localhost
export KALDI_EN_PORT=2700
export LANGUAGE_TOOL_PORT=8100
$ docker build -t media-hare:latest .
$ docker run -it --rm --entrypoint /bin/zsh --device /dev/dri --device /dev/nvidiactl --device /dev/nvidia0 --device /dev/nvidia-uvm -v ~/Movies:/Movies -v /path/to/media:/media -v .:/Workspace -v ~/.media-hare.ini:/etc/media-hare.ini media-hare:latest
$ podman machine init --cpus 10 --disk-size 30 -m 16384 -v ~/Movies:/Movies -v ~/Workspace:/Workspace -v /path/to/media:/media
$ podman build -t media-hare:latest .
$ podman run -it --rm --entrypoint /bin/zsh -v /Movies:/Movies -v /path/to/media:/media -v .:/Workspace /Workspace/media-hare.ini:/etc/media-hare.ini media-hare:latest
Configuration is in media-hare.ini
. This can be found in /etc/media-hare.ini
or ${HOME}/.media-hare.ini
. In your
media directories, you can also add media-hare.ini
and override configurations for content in that directory and
subdirectories.
See dvrprocess/media-hare.defaults.ini
for all available options and documentation. Edit media-hare.ini
in one of the
paths specified above to override.
Transcode videos to target codecs and other settings. Targets a specific language and keeps only one audio stream (it does preserve additional audio added by the profanity filter). Do not use if you want multiple language tracks.
Searches for media that needs to be transcoded. Supports several query parameters such as video and audio codec. Defaults to values in media-hare.ini
.
Searches for media that needs to be transcoded and calls dvr_post_process.py. In a container runs periodically with time and size limits. See docs/transcode-apply.md.
Filters profanity using subtitles. See docs/profanity_filter.md.
Searches for media that needs the profanity filter applied. In a container runs periodically with time and size limits. See docs/profanity-apply-filter.md.
Reports the changes made by the profanity filter in one or more mkv files. Use this to evaluate how your phrase lists are performing. See docs/profanity-filter-report.md.
Searches for commercials in media. Add chapters in media and EDL files for processing by other tools. See docs/comchap.md.
Cuts commercials from media. See docs/comcut.md.
Performs extra commercial skip tuning. For TV shows performs extensive tuning per season. See docs/comtune.md.
Intended to cut commercials for TV shows only when cuts look consistent and ignores outliers. See docs/smart-comcut.md.
Extracts scenes from media into separate files using an EDL file. See docs/scene-extract.md.
Searches for media that needs commercials to be cut. Limited to paths with Movies
in the name. TV shows could be a lot of results.