-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnboundLocalError: local variable 'pipe' referenced before assignment
#256
Comments
I'm need extract many pdf and i have same problem ... |
I have the same problem .do you have a solution for it. |
Have you tried to run as sudo? |
@SatyaRamGV can you try textract |
This is error is with 1.6.1 I think it is sloved in 1.6.2, but v1.6.2 is not available as PyPI package...you should install from git repo |
I'm closing this issue due to inactivity. If you still encounter the issue with the latest version of textract, feel free to leave a comment with additional information and I'll reopen the issue. |
Same error in textract
|
@ewerkema Thanks for the Stackoverflow link. I have no experience with Docker, but I did find this issue which might be related. Can you comment if this is the same issue? Textract relies on the external command line tool |
@jpweytjens It was actually a memory problem of the Docker container. Due to insufficient memory the operation of |
I think I know where this comes from: this bit of code in ShellParser:
...coupled with forking issues on Unix: https://stackoverflow.com/questions/5306075/python-memory-allocation-error-using-subprocess-popen Since the out-of-memory error is an OSError, it gets caught in the This could be alleviated by adding a bare |
@SatyaRamGV I tried with versions textract==1.6.1, textract==1.6.2, textract==1.6.3. All these versions throw this error. I'm on my windows 10. I have enough memory to perform this task, still, I get the same error. Traceback (most recent call last): File "", line 1, in File "C:..\venv\lib\site-packages\textract\parsers_init_.py", line 77, in process File "C:..\venv\lib\site-packages\textract\parsers\utils.py", line 46, in process File "C:..\venv\lib\site-packages\textract\parsers\pdf_parser.py", line 31, in extract File "C:..\venv\lib\site-packages\textract\parsers\pdf_parser.py", line 48, in extract_pdfminer File "C:..\venv\lib\site-packages\textract\parsers\utils.py", line 96, in run UnboundLocalError: local variable 'pipe' referenced before assignment |
I had this problem when trying to read .doc files because I didn't have antiword properly installed. If you are on windows 10 and are trying to read .doc files, you need antiword from here: https://www.softpedia.com/get/Office-tools/Other-Office-Tools/Antiword.shtml |
I have exactly the same error as |
@PPGHPP I've made some changes that should make the actual error clearer, they're not deployed yet though. Can you try installing from master? It should be a command like |
Hi,
Thank you for your information. I did pip install as you asked.
Only ERROR was: " ERROR: pip's dependency resolver does not currently take
into account all the packages that are installed. This behaviour is the
source of the following dependency conflicts."
It did also this: "Successfully installed pdfminer.six-20191110"
Now I'm able to use it like text=textract.process("tacl_a_00344.pdf"), and
the result looks OK.
Thanks again!
BR PirkkoP
ti 17. elok. 2021 klo 15.29 traverseda ***@***.***) kirjoitti:
… @PPGHPP <https://github.com/PPGHPP> I've made some changes that should
make the actual error clearer, they're not deployed yet though. Can you try
installing from master?
It should be a command like pip install git+
https://github.com/deanmalmgren/textract.git, although I'm not sure how
you installed it on windows.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#256 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGC2VVZFJ2BVSSMPZYQD2YLT5JIZRANCNFSM4GD7JIWA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
Hi again,
One thing I noticed. Sentence comes with textract like this: "We define
improvement as the quantity\r\nmax{0, fa \xe2\x88\x92 fb }, where b is our
current..."
BUT OCR-based pytesseract makes it "We define improvement as the
quantity\r\nmax{0, fa — fy}. where b is our current ..."
From p.764 of the attachment.
BR PirkkoP
ti 17. elok. 2021 klo 21.14 Pirkko Pietiläinen ***@***.***)
kirjoitti:
… Hi,
Thank you for your information. I did pip install as you asked.
Only ERROR was: " ERROR: pip's dependency resolver does not currently take
into account all the packages that are installed. This behaviour is the
source of the following dependency conflicts."
It did also this: "Successfully installed pdfminer.six-20191110"
Now I'm able to use it like text=textract.process("tacl_a_00344.pdf"),
and the result looks OK.
Thanks again!
BR PirkkoP
ti 17. elok. 2021 klo 15.29 traverseda ***@***.***)
kirjoitti:
> @PPGHPP <https://github.com/PPGHPP> I've made some changes that should
> make the actual error clearer, they're not deployed yet though. Can you try
> installing from master?
>
> It should be a command like pip install git+
> https://github.com/deanmalmgren/textract.git, although I'm not sure how
> you installed it on windows.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#256 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AGC2VVZFJ2BVSSMPZYQD2YLT5JIZRANCNFSM4GD7JIWA>
> .
> Triage notifications on the go with GitHub Mobile for iOS
> <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
> or Android
> <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
> .
>
|
I think that probably has something to do with chardet. The next release should help. |
text = textract.process(file, method='pdfminer')
Error:
UnboundLocalError Traceback (most recent call last)
in ()
----> 1 text = textract.process(file, method='pdfminer')
~/.local/lib/python3.6/site-packages/textract/parsers/init.py in process(filename, encoding, extension, **kwargs)
75
76 parser = filetype_module.Parser()
---> 77 return parser.process(filename, encoding, **kwargs)
78
79
~/.local/lib/python3.6/site-packages/textract/parsers/utils.py in process(self, filename, encoding, **kwargs)
44 # output encoding
45 # http://nedbatchelder.com/text/unipain/unipain.html#35
---> 46 byte_string = self.extract(filename, **kwargs)
47 unicode_string = self.decode(byte_string)
48 return self.encode(unicode_string, encoding)
~/.local/lib/python3.6/site-packages/textract/parsers/pdf_parser.py in extract(self, filename, method, **kwargs)
29
30 elif method == 'pdfminer':
---> 31 return self.extract_pdfminer(filename, **kwargs)
32 elif method == 'tesseract':
33 return self.extract_tesseract(filename, **kwargs)
~/.local/lib/python3.6/site-packages/textract/parsers/pdf_parser.py in extract_pdfminer(self, filename, **kwargs)
46 def extract_pdfminer(self, filename, **kwargs):
47 """Extract text from pdfs using pdfminer."""
---> 48 stdout, _ = self.run(['pdf2txt.py', filename])
49 return stdout
50
~/.local/lib/python3.6/site-packages/textract/parsers/utils.py in run(self, args)
94 # pipe.wait() ends up hanging on large files. using
95 # pipe.communicate appears to avoid this issue
---> 96 stdout, stderr = pipe.communicate()
97
98 # if pipe is busted, raise an error (unlike Fabric)
UnboundLocalError: local variable 'pipe' referenced before assignment
Originally posted by @SatyaRamGV in https://github.com/deanmalmgren/textract/issue_comments#issuecomment-439043876
The text was updated successfully, but these errors were encountered: