Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect 1x1 pixel output and re-rasterized with fallback DPI #46

Merged
merged 1 commit into from
Dec 12, 2023

Conversation

donaldgray
Copy link
Member

This extends PR #42.

If pdftoppm detects an image that is too large it aborts with the error Bogus memory allocation size and outputs a 1x1 pixel empty jpg. Unfortunately the pypi lib used doesn't detect this exception so thinks it has been successfully rasterized.

This PR extends the processing that happens after the initial rasterization takes place. If a 1x1 pixel image is detected after the initial rasterization process has occurred then the process will attempt to rasterize any 1x1 pages using a fallback DPI (defaults to 200, specified as PDF_RASTERIZER_FALLBACK_DPI).

When re-rasterizing a different output file format is specified as the results of a call to convert_from_path are any images on disk that match the specified file pattern, not those that have just been generated. For ease each page is done 1 at a time, if this is too slow then it can be done in bulk. Doing in bulk can use multiple threads and mean the pages are returned out of order.

@donaldgray donaldgray requested a review from fmcc December 12, 2023 11:32
@donaldgray donaldgray merged commit e031c34 into main Dec 12, 2023
1 check passed
@donaldgray donaldgray deleted the feature/handle_large_images branch December 12, 2023 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants