PDF_OCR_Python A project from Freelancer.com. The development plan document is also available. OCR on PDF to get text from scanned documents. Tesserect works fairly well Parsing raw text using regular expressions