rectangular (image) selection #52

rhaynes74 · 2022-01-04T13:32:13Z

Hi folks, thanks for your efforts with this tool. I was wondering if there are plans to add rectangular (image) selection to the tool?
The workflow that I imagine would be to select figures / tables / formulas using the often available rectangular selection, and then have those selection saved as images (.png, .jpg, etc...) and included as links in the markdown file.

0xabu · 2022-01-04T17:50:50Z

Thanks for the suggestion. This sounds like a reasonable idea for a feature, but it's also not something I'm likely to work on soon as it's not directly relevant to my use-case... I'd be happy to review PRs.

liang-0131 · 2022-01-18T09:53:48Z

I need that too

thiswillbeyourgithub · 2022-04-26T12:13:05Z

Badly needed here too :)
as well as #39 to import my okular collection to logseq

thiswillbeyourgithub · 2022-05-14T15:26:28Z

@0xabu would you by any chance have a recommended best way to extract an image rendering of a pdf given the precise bounding box location ? I am willing to try to implement this in the coming months.

0xabu · 2022-05-14T17:45:00Z

@thiswillbeyourgithub not really, sorry. pdfminer already has the ability to extract images as bitmaps (see calls to render_image in https://github.com/pdfminer/pdfminer.six/blob/master/pdfminer/converter.py), but I'm not sure about capturing an arbitrary section of the page.

thiswillbeyourgithub · 2022-05-15T12:05:01Z

After looking a bit into it it appears that pdfminer is quite complicated to get into. I want to spend as little time on it as possible when i'll get to it so :

If I were to simply:

turn each PDF pages into an image in a temporary directory (https://www.geeksforgeeks.org/convert-pdf-to-image-using-python/)
extract the relevant image section of the right pages given the bouding box of the PDF highlight (https://stackoverflow.com/questions/6496394/how-can-i-select-a-part-of-a-image-using-python)
of course at the end delete the temporary directory

Would you find this a satisfactory PR or that that seem to hacky for you ? Of course with a bit of optimization to avoid converting useless pages etc.

Another possibility would be to open an issue in the pdfminer github and ask them their opinion.

0xabu · 2022-05-15T16:14:12Z

I'm not excited about that approach, sorry -- it would add both pdf2image and PIL as dependencies (and from what I can see pdf2image itself just shells out to poppler utils). pdfminer has a gitter chat, maybe you could ask for advice there, or look around at some of the other apps that build on top of pdfminer for inspiration?

thiswillbeyourgithub · 2022-05-24T22:06:03Z

Thank you for the quick answer. Can you take a look at this comment in the github of pdfminer ? Would using py-pdf-parser be an acceptal PR by your standard or not ?

0xabu · 2022-05-25T04:59:42Z

I looked at py-pdf-parser, if you look here it appears to be relying on wand (which is a python wrapper for imagemagick) to convert pdf pages to bitmaps. The rest of py-pdf-parser is irrelevant for us.

I think I would be ok with adding wand as an optional dependency, so if it is not present, then the image export functionality doen't work, but you don't need it to keep using pdfannots as today.

thiswillbeyourgithub · 2022-05-25T14:40:14Z

Alright. Thank you very much. I intend to do this in the summer probably.

0xabu added the enhancement label Jan 4, 2022

thiswillbeyourgithub mentioned this issue May 16, 2022

Feature request: ability to extract a portion of a page as image pdfminer/pdfminer.six#759

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rectangular (image) selection #52

rectangular (image) selection #52

rhaynes74 commented Jan 4, 2022

0xabu commented Jan 4, 2022

liang-0131 commented Jan 18, 2022

thiswillbeyourgithub commented Apr 26, 2022 •

edited

Loading

thiswillbeyourgithub commented May 14, 2022

0xabu commented May 14, 2022

thiswillbeyourgithub commented May 15, 2022 •

edited

Loading

0xabu commented May 15, 2022

thiswillbeyourgithub commented May 24, 2022

0xabu commented May 25, 2022

thiswillbeyourgithub commented May 25, 2022

rectangular (image) selection #52

rectangular (image) selection #52

Comments

rhaynes74 commented Jan 4, 2022

0xabu commented Jan 4, 2022

liang-0131 commented Jan 18, 2022

thiswillbeyourgithub commented Apr 26, 2022 • edited Loading

thiswillbeyourgithub commented May 14, 2022

0xabu commented May 14, 2022

thiswillbeyourgithub commented May 15, 2022 • edited Loading

0xabu commented May 15, 2022

thiswillbeyourgithub commented May 24, 2022

0xabu commented May 25, 2022

thiswillbeyourgithub commented May 25, 2022

thiswillbeyourgithub commented Apr 26, 2022 •

edited

Loading

thiswillbeyourgithub commented May 15, 2022 •

edited

Loading