The final pdfs are posted on Google Cloud Storage: https://storage.googleapis.com/in-electoral-rolls/dadra_pdfs.tar.gz
Requester pays for the charges associated with downloading the data. For more information about about that, see: https://cloud.google.com/storage/docs/requester-pays
URL = http://ceodnh.nic.in/Electoral2017.aspx
Year = Final Electoral Roll for 2017
The Script does three things:
-
Produces dadra.csv that contains metadata about the pdfs. The CSV has the following fields:
language, main_or_supplementary, part_no, file_name
-
Downloads all the pdfs to a directory called
dadra_pdfs/
-
Renames files as follows:
- English language rolls have the prefix
eng
and Gujarati language rolls have the prefixguj
. - The
main
rolls have the wordmain
in them and supplementarysupp
- And the last segment is the 3 digit part_no.
So a sample name = eng_main_001.pdf
- English language rolls have the prefix
pip install -r requirements.txt
python dadra.py
lang | type | file_name |
---|---|---|
eng | main | 266 |
eng | supp | 255 |
guj | main | 266 |
guj | supp | 252 |
There are missing supplementary files getting error 404 (File or directory not found).
Draft roll for 2018 is also available.