-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Images contained in objects of type "/Pattern" are not retrieved #2613
Comments
Thanks for the report. To determine the images associated with a page, pypdf does indeed not consider nested xobjects for image extraction. |
pypdf can looks in sub XObjects, however here you are looking for an object which is part of a pattern which is not for me the way to do things.
I will try to propose also a easier way to extract an image |
with the new PR extraction will be easier:
|
Wouldn't it be better to have the fonction that should extract all images of a page to actually extract all images of the pages ? The PDF standard said that images can be stored inside Patterns so we should expect to find images in them. |
I agree that images can be stored in patterns, but the solution used inhere is not common. a pattern is expected in a context to provided a repeated image in a surface. |
We could implement a bool parameter recurse, deepSearch or whatever to the _page.images method. When set to False, the standards methods _page._get_ids_image, _page._get_image would get called, keeping the image retrieval to it's simplest form, in the inline images and images dictionaries of the page. When set to True, we could call the standard methods and return on top of their results images found in "special" cases like Patterns. This way we still keep it efficient for the current usage. |
We can propose a PR |
Well well well, _page.images isn't a method but a property so passing a parameter to it isn't an option... |
Explanation
Hello,
First of all, thanks for your works, it's a very helpful library.
I am not able to extract images from PDF generated with OnlyOffice :
B2.pdf
After looking into the PDF structure, it seems that the image in this PDF page, is contained inside a Tiling Patterns object, which can't be handled by "_page._get_ids_image" nor "_page._get_image".
I've took a look at PDF standards and it's specified that Tiling Patterns can be made of images so it's not an OnlyOffice issue.
I don't have read completely the standards about Patterns, but once this is done I'd like to make a proposition to at least be able to retrieve images from them, so when we try to get images from a page, it also considers Patterns.
What do you think about it ?
Have a nice day !
The text was updated successfully, but these errors were encountered: