-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.0.21 causes previously-consumable PDFs to fail now with RangeError #191
Comments
Hi @rdunlop , Thank you for opening this issue. I totally understand your concern and I myself was debating this change for his very reason. This isn't about a performance optimization. I would much rather be able to read malformed PDF files than run faster... ...however, as I explained in #185 , this is required to accommodate properly authored PDF files that are allowed to contain PDF-like markers in their stream data (i.e., a PDF explaining how PDF data looks might contain the PDF The choice was either to continue failing on valid PDF files or to patch in a way that limited support for malformed PDF files... I guess there's a way to support both variations, I just didn't see it at the time (though I see it now, it might have a performance penalty). I'm not high on time, but if you want to submit a PR that prefers valid PDF files and supports some sort of handling for malformed PDF files, that would be great. Cheers, |
This issue happened for me as well. PR seems to fix @boazsegev. |
Has there been any updates on this ticket or #205 as yet on whether it will be merged or not? @boazsegev |
Thanks for the PR, is there anyway to get this fix merged @boazsegev ? |
Sorry to bimp that PR, but we experience the same bug in production ! |
Still alive in 1.0.26 |
I suspect that the input PDF that I'm dealing with is invalid...but I wanted to mention that it was working in 1.0.20, but no longer in 1.0.21.
The PDF appears to have an invalid stream defined near the end of my file (relevant part here::
(pretty printed):
As you can see, the Length is 2200, but there are not 2200 bytes left in the file, and thus the
@scanner.pos += out.last[:Length].to_i - 2
(here)[https://github.com/boazsegev/combine_pdf/blob/b966e703fd897ff50832d3823e74791099b82ca3/lib/combine_pdf/parser.rb#L364] causes a RangeError.
I am opening this ticket because I'm 90% sure that this is an invalid PDF, but I wanted to mention it out loud that the change introduced in 1.0.21 is (to me) a regression in capability. I recognize that #184 is a related issue.
For now, I've resolved my issue by reverting to 1.0.20. Not ideal, but sufficient for my purposes for now.
The text was updated successfully, but these errors were encountered: