Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on doc.process() #14

Open
rsteca opened this issue Sep 2, 2016 · 2 comments
Open

Error on doc.process() #14

rsteca opened this issue Sep 2, 2016 · 2 comments

Comments

@rsteca
Copy link

rsteca commented Sep 2, 2016

When doing:

import doc2text
doc = doc2text.Document()
doc.read('something.pdf')
doc.process()

I get:

Error in /usr/local/lib/python2.7/dist-packages/doc2text/page.py on line 23
dst is not a numpy array, neither a scalar
Error in /usr/local/lib/python2.7/dist-packages/doc2text/page.py on line 197
dst is not a numpy array, neither a scalar
Error in /usr/local/lib/python2.7/dist-packages/doc2text/page.py on line 77
dst is not a numpy array, neither a scalar

And then, when I do:

doc.extract_text()

I get:

AttributeError                            Traceback (most recent call last)
<ipython-input-5-57184997370d> in <module>()
----> 1 doc.extract_text()

/usr/local/lib/python2.7/dist-packages/doc2text/__init__.pyc in extract_text(self)
     89             for page in self.processed_pages:
     90                 new = page
---> 91                 text = new.extract_text()
     92                 self.page_content.append(text)
     93         else:

/usr/local/lib/python2.7/dist-packages/doc2text/page.pyc in extract_text(self)
     36     def extract_text(self):
     37         temp_path = 'text_temp.png'
---> 38         cv2.imwrite(temp_path, self.image)
     39         self.text = pytesseract.image_to_string(Image.open(temp_path))
     40         os.remove(temp_path)

AttributeError: Page instance has no attribute 'image'
@remi-pr
Copy link

remi-pr commented Sep 5, 2016

If I am not mistaken this is due to using OpenCV version 2.x rather than 3.0. In this case cv2.resize() interprets the third argument as a destination array rather than the interpolation method as intended. One fix, included in the pull request #13, is to use named arguments (cv2.resize(foo, bar, interpolation=cv2.INTER_AREA) on line 77). This should make the code compatible with both OpenCV versions.

The second error you are getting is probably a consequence of the first (self.image is not produced because the downscale_image call failed).

@jlsutherland
Copy link
Owner

@remi-pr is correct. Merging #13 should fix part of the issue and result in increased localizability for stacktraces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants