URL = http://www.ceo.kerala.gov.in/erollArchives.html
The Script does 3 things:
- Produces metadata files that contain metadata about the pdfs. The CSV has the following fields:
year, leg_assembly, booth_no, eng_file_name
- Renames the pdfs as follows:
- lowercase, snake_case
- year (4 digit), constituency_number (3 digit code), and polling_station_number (3 digit code)
so sample file name = 2011_001_001.pdf
- Downloads all the pdfs to a directory called
kerala_pdfs/kerala_20XX/
pip install -r requirements.txt
python kerala_archives.py
None
None
None
- 2016_114_057.pdf
- 2016_114_054.pdf
- 2016_100_067.pdf
- 2016_114_053.pdf
- 2016_096_095.pdf
- 2016_100_061.pdf
- 2016_096_096.pdf
- 2016_100_064.pdf
- 2016_114_050.pdf
- 2016_114_055.pdf
- 2016_114_049.pdf
- 2016_100_062.pdf
- 2016_114_052.pdf
- 2016_090_181.pdf
- 2016_096_093.pdf
- 2016_114_048.pdf
- 2016_096_091.pdf
- 2016_100_068.pdf
- 2016_114_051.pdf
- 2016_100_063.pdf
- 2016_100_065.pdf
- 2016_114_056.pdf
- 2016_096_092.pdf
- 2016_100_066.pdf
- 2016_096_094.pdf