Summary: Official annotations were removed, about 1.4 billion videos were archived, data has been uploaded to the Internet Archive, and Invidious can display the archived annotations.
Please see here: https://old.reddit.com/r/DataHoarder/comments/aa6czg/youtube_annotation_archive/
For cloudrac3r's work, see README.md in the node
folder.
Provides scripts for archiving YouTube Annotations. See the wiki for information about how it works.
Annotations on every YouTube video will be deleted forever on the 15th of January. The purpose of this project is to archive as much annotation data as possible before that happens.
The current process is to scrape as many channel IDs as possible, then to scrape video IDs from those channels, then to download annotation data for those videos.
If you would like to make sure specific channels are archived before the 15th, you can use this tool.
Download the Dockerfile
located in the /docker
folder with
$ wget https://github.com/omarroth/archive/raw/master/docker/Dockerfile
Then in the same directory run the following command to build the image:
$ docker build -t archive .
Use the following commands to create a container with the image and run it to begin the archiving process:
$ docker create --name=archive-worker archive:latest
$ docker container start archive-worker
# Install dependencies
$ sudo apt-get install curl python-software-properties
$ curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash -
$ sudo apt-get install nodejs gcc g++ make
$ git clone https://github.com/omarroth/archive
$ cd archive/node
$ npm install
$ cd worker
$ node index.js
Create a new Heroku app and point it to https://github.com/omarroth/archive on the branch "heroku", and trigger a manual deploy.
Enable automatic deploys to receive the latest updates automatically.
The webserver is just a placeholder — open the logs to see what's currently going on.
# Install dependencies
$ curl -sSL https://dist.crystal-lang.org/apt/setup.sh | sudo bash
$ sudo apt-get update
$ sudo apt-get install crystal libssl-dev libxml2-dev libyaml-dev libgmp-dev libreadline-dev librsvg2-dev
$ git clone https://github.com/omarroth/archive
$ cd archive
$ shards
$ crystal build src/worker.cr --release
$ ./worker -u https://archive.omar.yt -t 20
$ ./worker -h
-u URL, --batch-url=URL Master server URL
-t THREADS, --max-threads=THREADS
Number of threads for downloading annotations
-h, --help Show this help
- Omar Roth - creator and maintainer
- cloudrac3r - JavaScript developer