Detect 1x1 pixel output and re-rasterized with fallback DPI #46
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This extends PR #42.
If pdftoppm detects an image that is too large it aborts with the error
Bogus memory allocation size
and outputs a 1x1 pixel empty jpg. Unfortunately the pypi lib used doesn't detect this exception so thinks it has been successfully rasterized.This PR extends the processing that happens after the initial rasterization takes place. If a 1x1 pixel image is detected after the initial rasterization process has occurred then the process will attempt to rasterize any 1x1 pages using a fallback DPI (defaults to 200, specified as
PDF_RASTERIZER_FALLBACK_DPI
).When re-rasterizing a different output file format is specified as the results of a call to
convert_from_path
are any images on disk that match the specified file pattern, not those that have just been generated. For ease each page is done 1 at a time, if this is too slow then it can be done in bulk. Doing in bulk can use multiple threads and mean the pages are returned out of order.