Skip to content

Latest commit

 

History

History
84 lines (60 loc) · 3.33 KB

readme.md

File metadata and controls

84 lines (60 loc) · 3.33 KB

Kerala Archive Data

Details

URL = http://www.ceo.kerala.gov.in/erollArchives.html

Script

The Script does 3 things:

  1. Produces metadata files that contain metadata about the pdfs. The CSV has the following fields: year, leg_assembly, booth_no, eng_file_name
  1. Renames the pdfs as follows:
  • lowercase, snake_case
  • year (4 digit), constituency_number (3 digit code), and polling_station_number (3 digit code)

so sample file name = 2011_001_001.pdf

  1. Downloads all the pdfs to a directory called kerala_pdfs/kerala_20XX/

Running the script

pip install -r requirements.txt
python kerala_archives.py

Corrupted PDF files

2011

2012

None

2013

None

2014

None

2015

2016