Extracting (some) Annotations from PDFs

A short script that uses the pypdf library to extract a list of annotations from a PDF.

Currently only works for the following types of annotations:

Highlighted Notes:

"Caret" annotations (strikethrough, but with a text suggestion):

Strikethrough annotations (just says "remove this part"):

Ignores hyperlinks in the text such as those generated for citations in latex-built academic article PDFs.

Probably useful for grad students who need to make sure they address every comment/annotation their advisor or a reviewer makes on a PDF they shared (e.g., article, response letter etc.). I used this to simply generate a list of action items over which I can check my work and tell whether I missed a review point / comment.

Note that the annotations are saved as vector items on the page coordinate frame rather than as attached to certain parts of the text, which is how the author typically remembers parts of the text. In other words, you don't get which word was striked out, you can only get its location on the page, and that is not extremely useful since it's in some metric form that is not intuitive.

Installation

Just install pypdf via pip install pypdf

Usage

python extract_annotations.py pdffile.pdf > example_output.txt

References

Inspired by the following discussions:

and the following page from the pypdf documentation:

https://pypdf.readthedocs.io/en/stable/user/reading-pdf-annotations.html?highlight=annots#text

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
example_output.txt		example_output.txt
extract_annotations.py		extract_annotations.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extracting (some) Annotations from PDFs

Installation

Usage

References

About

Releases

Packages

Languages

License

sonebu/pdf-annotation-extraction

Folders and files

Latest commit

History

Repository files navigation

Extracting (some) Annotations from PDFs

Installation

Usage

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages