Unofficial No Such Thing As A Fish episode transcripts.
- Run
npm install
- Run
npm run dev
TODO: Add instructions for creating database
-
Install deps
- Run
pip install -r requirements.txt
- Run
-
Download most recent episodes and transcribe them
-
Change line 11 of whisper.py to
local_files_only=False
-
(Optional): Change line 5 of whisper.py
model_size = 'large-v2'
to your preferred model, see note below for details, see available models. -
Run
npm run convert
(this is idempotent and will go through all episodes)NOTE: By default this uses the
medium.en
Whisper model. On an M1 Mac with 64GB of RAM this transcribes at about1.4x
speed. This means an hour long episode gets transcribed in about 42 minutes.So, as of 25 July 2023:
select sum(duration) from episodes -- 1292175
1,292,175.0 seconds ÷ 60.0 seconds ÷ 60.0 minutes ÷ 24.0 hours ----------------------- = 15.0 days ÷ 1.4 speed ----------------------- = 10.7 days
The good news is changing to the
small.en
or thetiny.en
increases this speed dramatically but the accuracy goes down slightly.small.en
transcribes at about3x
speed, for example.The other good news is you can kill the script (
Ctrl + C
) and restart it at any time and it will pick back up after the last fully transcribed episode.NOTE: This script also downloads all the audio files for the episodes as well as each episode's album art. As of 25 July 2023 this amounts to 487 episodes, ~20GB audio, ~130MB images.
-
-
Split database into chunks
- Run
npm run split:db
- Run
-
(Optional) Sync database, audio, images, and fonts to (Cloudflare) R2. Needs
rclone
andjq
installed.- Run
npm run sync
- Run