Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create pdf2txt.py #2

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AzharMithani
Copy link

Wrote a script as per the requirment

@liadmagen liadmagen self-requested a review October 12, 2018 09:34
@liadmagen liadmagen added enhancement New feature or request hacktoberfest 🍁 https://hacktoberfest.digitalocean.com/ labels Oct 12, 2018
@liadmagen
Copy link
Member

The packages (textract, PyPDF2) should be added at least to the pipenv, to be installed for new users.

Textract gives hard time installing, sometimes (at least on windows), because of 3rd party packages installations for OCR. Maybe using the underlying packages (i.e. PDFminer) can ease that pain,

Copy link
Member

@liadmagen liadmagen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thank you for your contribution!
In order to make it work, it needs to actually take the files from the data/raw folder.

Consider turning it into a set of functions that receive as a parameter the folder path and the destination path, and recursively go over it, convert the pdf to text, and saves the result into the destination folder.

@liadmagen liadmagen added invalid This doesn't seem right and removed enhancement New feature or request labels Oct 13, 2018
@AzharMithani
Copy link
Author

AzharMithani commented Oct 14, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest 🍁 https://hacktoberfest.digitalocean.com/ invalid This doesn't seem right
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants