The Paul-Parser is used by iUPB to extract all courses of the University of Paderborn. The code is a bit messy, but it works quite allright. Feel free to fork it and make it bit more modular and more maintainable.
If you are only interested in the course data, checkout our course API at dev.yippie.io
- Install Ruby 1.9
- Install Mongodb
brew install mongodb
- Start
mongodb
under localhost - Install dependencies with
bundle install
- Download all courses of the current semester into mongodb with
bundle exec ruby crawler.rb
- In your mongodb, you will find a collection named raw_pages in the database paul
- Analyse all courses with
bundle exec ruby parser.rb
- Open the collection named courses and do what ever you like with the information
- to export the found data, simply run
mongoexport --db paul --collection courses > courses.json
Fork our repository, change, test and then make a pull-request.
This is GPL v3 software.