A transcript crawler, search engine and explorer for SRF news and talk shows. http://srf-transcriptor.herokuapp.com/
This was implemented at the SRG SSR Hackdays 2014 and is mostly a proof of concept.
Documentation of data formats used can be found in the wiki.
npm install
npm start
This will start the server on localhost:3000
serving API endpoints for searching and recieving transcripts. Additionally angular/dist
is served statically.
Api Example: http://srf-transcriptor.herokuapp.com/search?q=Geri%20M%C3%BCller
The front end build is checked in for easy deploy to heroku of the whole application. Could be optimized in the future.
npm install
npm start
cd angular/
npm install
bower install
grunt serve
The front end dev server will relay requests to the API endpoints to localhost:3000
via grunt-connect-proxy - make sure to also run the main server from the root directory.
The crawler is part of the backend.
cd backend
grunt --help
grunt add:show --id=3b016ffc-afa2-466d-a694-c48b7ffe1783
This will fetch episode information and transcripts of all added shows.
grunt fetch:shows
grunt fetch:transcripts
grunt parse:transcripts
grunt parse:shows
Currently the processed data needs to be checked in for deployment.
- Node.js
- Grunt
- Bower
- Compass
There is an experimental algorithm included to compose short clips out of text.
Usage:
node backend/clip -m 'Krieg in eine Weile her und es wird Sie eine Weile nicht sehen können, den Fall von ihm zu bekommen, ist nicht ein Problem mit ihm für eine Weile her,'
A MP4 clip will be composed and saved to backend/clips
, alongside with a JSON file with the source meta data.
Above message is sourced from @lauraperrenoud tweet and translated to German with Google Translate.