Detect 1x1 pixel output and re-rasterized with fallback DPI #46

donaldgray · 2023-12-11T17:30:04Z

This extends PR #42.

If pdftoppm detects an image that is too large it aborts with the error Bogus memory allocation size and outputs a 1x1 pixel empty jpg. Unfortunately the pypi lib used doesn't detect this exception so thinks it has been successfully rasterized.

This PR extends the processing that happens after the initial rasterization takes place. If a 1x1 pixel image is detected after the initial rasterization process has occurred then the process will attempt to rasterize any 1x1 pages using a fallback DPI (defaults to 200, specified as PDF_RASTERIZER_FALLBACK_DPI).

When re-rasterizing a different output file format is specified as the results of a call to convert_from_path are any images on disk that match the specified file pattern, not those that have just been generated. For ease each page is done 1 at a time, if this is too slow then it can be done in bulk. Doing in bulk can use multiple threads and mean the pages are returned out of order.

Detect 1x1 pixel output and re-rasterized with fallback DPI

729d57c

donaldgray requested a review from fmcc December 12, 2023 11:32

fmcc approved these changes Dec 12, 2023

View reviewed changes

donaldgray merged commit e031c34 into main Dec 12, 2023
1 check passed

donaldgray deleted the feature/handle_large_images branch December 12, 2023 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect 1x1 pixel output and re-rasterized with fallback DPI #46

Detect 1x1 pixel output and re-rasterized with fallback DPI #46

donaldgray commented Dec 11, 2023

Detect 1x1 pixel output and re-rasterized with fallback DPI #46

Detect 1x1 pixel output and re-rasterized with fallback DPI #46

Conversation

donaldgray commented Dec 11, 2023