The aim is to make a system that takes in captions coming from openedcaptions.com (also on github) by Dan Schultz and adds them to a google doc in real time (or close enough) until you stop it.
Opened Captions provides a simple web socket/socket.io interface but a Google Apps script cannot consume a web socket. So provided here is a simple Node.js HTTP server that caches the stream and returns captions via a GET request.
This repo is already configured to be deployed to a Heroku instance. The app uses local memory to cache the captions, so it doesn't require a DBMS or cache daemon. However this also means that stopping, restarting or redeploying the server clears the caption cache.
Run this server on a heroku or EC2 instance or something similar. See below for details on running it locally and using ngrok
to access the local host end point from google docs app script.
Some more details on opened captions project for context.
Opened Captions: Turning the spoken words on TV screens into streams of hackable data
The result: Opened Captions. It provides a real-time API for closed captions pulled from C-SPAN. The system makes it possible to code against what’s being said on TV right now, and by solving this one really tricky problem, it makes a broad range of applications possible. https://openedcaptions.com/
- github opened-captions-example
- githib opened-captions
- CSPan live stream to check against captions.
Install dependencies:
npm install
To start the server:
npm start
The server starts on port 5000 and begins caching captions. You can set the port with the PORT
environment variable. You can also have the server write the streamed captions to a file by providing the file name or path in the TRANSCRIPT_FILE
environment variable. The server will also read this file into the cache on startup if the file exists.
The single API end point takes a since
parameter which is a timestamp. The API response will include a now
property which you should use for the next API request. The timestamp holds your place in the captions so subsequent requests will provide captions that can be appended to the captions recieved in previous requests.
First request:
GET /
or
GET /?since=0
which will return something like
{
"now": 123456789,
"captions": "blah blah blah ..."
}
on the next request, include the last now
value:
GET /?since=123456789
and you'll get a new timestamp and more captions to add to the last ones.
How would I connect this to a google doc?
- Create a google doc, and add a script to it. tools -> script editor
- Copy and paste ./google_app_script/main.gs into the google app script.
- Set the server url in the Google script to correspond to an instance of this Node app.
ngrok description:
Don’t constantly redeploy your in-progress work to get feedback from clients. ngrok creates a secure public URL (https://yourapp.ngrok.io) to a local webserver on your machine. Iterate quickly with immediate feedback without interrupting flow.
-
To install
npm install -g ngrok
-
To run, start ngrok forwarding
ngrok http 5000
-
This will give you a url like this http://c8b8351d.ngrok.io/ which you can add to the google app script, as described in next section.
Alternatively you can deploy on an other server instance on heroku, EC2 etc.. and get that end point.
Then to get things going setup a 1 minute triggered event in google app script for updateCaptions
under Resources -> All your triggers. More info on running a google app script every minute can be found in their documentation
For troubleshooting, you can check against C-SPAN live stream or opened captions.
The project is currently in active development, feel free to get in touch with any questions.