Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pymupdf backend #61

Open
wants to merge 68 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
456f9fa
Add keyboard annotation function
dalanicolai Jun 13, 2021
9661fac
Fix process-connection-type and remove defadvice
dalanicolai Dec 7, 2021
7f4efae
Add initial rudimentary version vimura-tq-server
dalanicolai Dec 7, 2021
9e2b9f8
Add initial active region marking (using Pillow package)
dalanicolai Dec 9, 2021
0f73fb0
Add toggle server/backend (temporary version)
dalanicolai Dec 9, 2021
6adf50c
Fix annotations and implement proof of concept line annot
dalanicolai Dec 14, 2021
3a97d9d
Fix annotations, render instantly
dalanicolai Dec 14, 2021
d9ac165
Fix selecttion functionality, fix parse filenames (and add logging)
dalanicolai Dec 17, 2021
cb21752
Server log stdout to file
dalanicolai Dec 17, 2021
2c23d26
Add section about pymupdf-tq branch to README
dalanicolai Dec 18, 2021
7c9716f
Fix 'getannots' (for activation of pdf-annot-minor-mode)
dalanicolai Dec 18, 2021
0d77284
Add comment about selection per word to README
dalanicolai Dec 18, 2021
ae26531
Use virtual file object for Pillow
dalanicolai Dec 18, 2021
0154872
Implement remaining 'simple' markup annotations
dalanicolai Dec 18, 2021
b1d078c
Make tq print server Exceptions to *pdf-info-log* buffer
dalanicolai Dec 18, 2021
413dd50
Implement basic (limited to single annot type) pagelinks function
dalanicolai Dec 19, 2021
2b9440b
Almost complete getannots function
dalanicolai Dec 19, 2021
1c15650
Implement line annotation
dalanicolai Dec 19, 2021
5880139
Add comments and small enhancements
dalanicolai Dec 19, 2021
e03beab
Use virtual file instead of temp disk file for renderpage
dalanicolai Dec 20, 2021
618f1d2
Fix annots (make them persistent)
dalanicolai Dec 20, 2021
5a5b704
Fix adding line annots, create them when immediately when drawing
dalanicolai Dec 20, 2021
1b53050
Small fixes (pagesise sometimes needs to load doc)
dalanicolai Dec 20, 2021
b4fa455
Change line to arrow annot (should eventually be made configurable)
dalanicolai Dec 20, 2021
381789b
Revert "Use virtual file instead of temp disk file for renderpage"
dalanicolai Dec 20, 2021
ad09a55
README fix link to wiki
dalanicolai Dec 21, 2021
4aaf56f
README add requirements
dalanicolai Dec 21, 2021
0b48f41
Update README about faulty 'selection' behavior in some documents
dalanicolai Jan 2, 2022
ef82170
Add keyboard annotation function
dalanicolai Jun 13, 2021
5a2d398
Fix process-connection-type and remove defadvice
dalanicolai Dec 7, 2021
7688aff
Add initial rudimentary version vimura-tq-server
dalanicolai Dec 7, 2021
c2b68db
Add initial active region marking (using Pillow package)
dalanicolai Dec 9, 2021
df135f2
Add toggle server/backend (temporary version)
dalanicolai Dec 9, 2021
05835bf
Fix annotations and implement proof of concept line annot
dalanicolai Dec 14, 2021
8cac994
Fix annotations, render instantly
dalanicolai Dec 14, 2021
e2865f5
Fix selecttion functionality, fix parse filenames (and add logging)
dalanicolai Dec 17, 2021
002b522
Server log stdout to file
dalanicolai Dec 17, 2021
27cdfc8
Clean up and organize README and add section about pymupdf-tq branch
dalanicolai Dec 18, 2021
fb19459
Fix 'getannots' (for activation of pdf-annot-minor-mode)
dalanicolai Dec 18, 2021
d591598
Add comment about selection per word to README
dalanicolai Dec 18, 2021
fd8ffe9
Use virtual file object for Pillow
dalanicolai Dec 18, 2021
315f059
Implement remaining 'simple' markup annotations
dalanicolai Dec 18, 2021
86f4652
Make tq print server Exceptions to *pdf-info-log* buffer
dalanicolai Dec 18, 2021
05a24ee
Implement basic (limited to single annot type) pagelinks function
dalanicolai Dec 19, 2021
32fef71
Almost complete getannots function
dalanicolai Dec 19, 2021
b4b469d
Implement line annotation
dalanicolai Dec 19, 2021
bd70dfe
Add comments and small enhancements
dalanicolai Dec 19, 2021
e9d3ba8
Use virtual file instead of temp disk file for renderpage
dalanicolai Dec 20, 2021
eb916c4
Fix annots (make them persistent)
dalanicolai Dec 20, 2021
676f109
Fix adding line annots, create them when immediately when drawing
dalanicolai Dec 20, 2021
b52cc0a
Small fixes (pagesise sometimes needs to load doc)
dalanicolai Dec 20, 2021
d650bec
Change line to arrow annot (should eventually be made configurable)
dalanicolai Dec 20, 2021
b1599b1
Revert "Use virtual file instead of temp disk file for renderpage"
dalanicolai Dec 20, 2021
3d87c70
README fix link to wiki
dalanicolai Dec 21, 2021
636936a
README add requirements
dalanicolai Dec 21, 2021
ab1cbd7
First partial working version of tq-interpreter
dalanicolai Dec 22, 2021
7992a69
Fix complete working version
dalanicolai Dec 25, 2021
59996cc
Implement scripting functionality
dalanicolai Dec 25, 2021
489d2d1
Implement outline feature
dalanicolai Dec 29, 2021
ca1fb78
Tiny spelling correction
dalanicolai Dec 29, 2021
23e341d
Add display metadata feature
dalanicolai Dec 29, 2021
6966896
Add backend toggle mechanism
dalanicolai Jan 2, 2022
3dd62e0
Add search feature
dalanicolai Jan 1, 2022
31e84dd
Add (fix) epub support
dalanicolai Jan 1, 2022
0412dd1
Merge branch 'tq-interpreter' into pymupdf-tq
dalanicolai Jan 2, 2022
fb58bc8
Script: send line with 'no-properties'
dalanicolai Jan 2, 2022
d895ed8
Add search_string function
dalanicolai Jan 2, 2022
24c372b
Remove server and instead make it pip installable
dalanicolai May 2, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
153 changes: 153 additions & 0 deletions README.org
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,158 @@
[[https://stable.melpa.org/#/pdf-tools][http://stable.melpa.org/packages/pdf-tools-badge.svg]]
[[https://melpa.org/#/pdf-tools][http://melpa.org/packages/pdf-tools-badge.svg]] [[https://ci.appveyor.com/project/vedang/pdf-tools][https://ci.appveyor.com/api/projects/status/yqic2san0wi7o5v8/branch/master?svg=true]]

** About this pymupdf-tq branch
In this branch an alternative pdf-tools server, which is [[https://pymupdf.readthedocs.io/en/latest/][PyMuPDF]] based and
accordingly written in python, is being developed. The goal of this branch is
to 'lower the bar' for maintaining and extending pdf-tools. The rather
'demanding' requirement of having a very good knowledge of the C programming
language for extending the pdf-tools server, now gets 'lowered' to a
requirement of merely having a very limited knowledge of the python
programming language. In combination with some moderate knowledge of Emacs
lisp for extending the client, this opens the possibility to quite
straightforwardly implement line and arrow annotation features. Additionally,
with slightly more effort, an ink/freehand annotation feature could be
implemented. Enthusiastic hackers, could add pdf-form support and, as some
users have requested, keyboard navigation/selection support (implementing a
cursor).

With the help of the more then excellent [[https://pymupdf.readthedocs.io/en/latest/][PyMuPDF documentation]], writing the
python code for these features should be quite straightforward. Writing the
elisp/client code could be tricky, but enthusiastic hackers can always count
on help from the community. At least the author of the initial code for this
backend (and the [[https://elpa.gnu.org/devel/sketch-mode.html][sketch-mode]] package and several other Emacs packages),
generously offers his full assistance.

If you would like to hack on pdf-tools/this backend, then a kind of
development tips and comments can be found in [[../../wiki/PyMuPDF-backend][the wiki section]].

*** Acknowledgements
The development of PDF-tools has been more or less a , very high quality,
man project that was contributed by Andreas Politz. We would hereby like to
express our special thanks to him for this outstanding package.

The server in this branch is PyMuPDF based> PyMuPDF is a very complete,
powerful and outstandingly documented python library. Therefore, special
thanks also goes to the developers of PyMuPDF (and its core library [[https://mupdf.com/][MuPDF]]
developed by Artifex Software, Inc.), and in particular to Jorj McKie for
his generous support.

*** Requirements
- Python *version 3.6
- [[https://pypi.org/project/vimura-server/][vimura-server 1.3]] (the vimura server can be installed using =pip install
--user vimura-server=. Including =--user= is important as this branch assumes
the server is installed as user site-package.)

*** Usage
Install this branch using a Quelpa recipe. In Spacemacs I am using the
following line of code to install pdf-tools from this branch of the
repository:
#+begin_src emacs-lisp :tangle yes
(pdf-tools :location (recipe
:fetcher github
:repo "dalanicolai/pdf-tools"
:branch "pymupdf-tq"
:files ("lisp/*.el"
"README"
("build" "Makefile")
("build" "server")
(:exclude "lisp/tablist.el" "lisp/tablist-filter.el"))))
#+end_src
You could also simply install the ~vimura.py~ file directly in your pdf-tools
directory, but I am not sure if all functions will work correctly (in
particular the search functionality, as I have made a small (temporary?)
change to ~pdf-isearch.el~)

Install the [[https://pypi.org/project/PyMuPDF/][PyMuPDF]], [[https://pypi.org/project/Pillow/][Pillow]] and [[https://pypi.org/project/pytz/][pytz]] python packages in your preferred python
environment (and make sure that Emacs uses the interpreter from the
environment).

Finally after loading pdf-tools, select the vimura server via ~M-x
pdf-tools-toggle-server~.

*** Arrow annotations
Use =C-M-down-mouse-1= (Ctrl+Alt + mouse left button) to draw arrow
annotations.

*** Comment
The PyMuPDF backend currently only selects/activates per word (instead of
per character, e.g. for highlighting). Selection per character would be
perfectly possible but requires a little extra python code to be written.
Also in some documents selection does not work totally as expected, this
might be fixable (a possible fix has been suggested [[https://github.com/pymupdf/PyMuPDF/discussions/1451#discussioncomment-1814271][here]]).

** About this package
PDF Tools is, among other things, a replacement of DocView for PDF
files. The key difference is that pages are not pre-rendered by
e.g. ghostscript and stored in the file-system, but rather created
on-demand and stored in memory.

This rendering is performed by a special library named, for
whatever reason, poppler, running inside a server program. This
program is called ~epdfinfo~ and its job is to successively
read requests from Emacs and produce the proper results, i.e. the
PNG image of a PDF page.

Actually, displaying PDF files is just one part of PDF Tools.
Since poppler can provide us with all kinds of information about a
document and is also able to modify it, there is a lot more we can
do with it. [[http://www.dailymotion.com/video/x2bc1is_pdf-tools-tourdeforce_tech?forcedQuality%3Dhd720][Watch]]

Please read also about [[#known-problems][known problems.]]

** Features
+ View :: View PDF documents in a buffer with DocView-like
bindings.
+ Isearch :: Interactively search PDF documents like any other
buffer, either for a string or a PCRE.
+ Occur :: List lines matching a string or regexp in one or more
PDF documents.
+ Follow ::
Click on highlighted links, moving to some part of a different
page, some external file, a website or any other URI. Links may
also be followed by keyboard commands.
+ Annotations :: Display and list text and markup annotations (like
underline), edit their contents and attributes
(e.g. color), move them around, delete them or
create new ones and then save the modifications
back to the PDF file.
+ Attachments :: Save files attached to the PDF-file or list them
in a dired buffer.
+ Outline :: Use imenu or a special buffer to examine and navigate
the PDF's outline.
+ SyncTeX :: Jump from a position on a page directly to the TeX
source and vice versa.
+ Virtual ::
Use a collection of documents as if it were one, big single PDF.

+ Misc ::
- Display PDF's metadata.
- Mark a region and kill the text from the PDF.
- Keep track of visited pages via a history.
- Apply a color filter for reading in low light conditions.

** Installation
The package may be installed via MELPA and it will try to build the
server part when it is activated the first time. Though the next
section regarding build-prerequisites is still relevant, the rest
of the installation instructions assume a build from within a git
repository. (The MELPA package has a different directory
structure.)

*** Server prerequisites
You'll need GNU Emacs \ge 24.3 and some form of a GNU/Linux OS.
Other operating systems are currently not supported (patches
welcome). The following instructions assume a Debian-based
system. (The prerequisites may be installed automatically on this
kind of systems, see [[#compilation][Compilation]] .)

First make sure a suitable build-system is installed. We need at
least a C/C++ compiler (both ~gcc~ and ~g++~), ~make~, ~automake~
and ~autoconf~.

Next we need to install a few libraries PDF Tools depends on, some
of which are probably already on your system.

The ~pdf-tools~ Wiki is maintained at https://pdftools.wiki. Head to the site if you find it easier to navigate a website for reading a manual. All the topics on the site are listed at https://pdftools.wiki/impulse.

* About PDF Tools
Expand Down Expand Up @@ -49,6 +201,7 @@ You'll need GNU Emacs \ge 24.3 and some form of a GNU/Linux OS. Other operating
First make sure a suitable build-system is installed. We need at least a C/C++ compiler (both ~gcc~ and ~g++~), ~make~, ~automake~ and ~autoconf~.

Next we need to install a few libraries ~pdf-tools~ depends on, some of which are probably already on your system.

#+begin_src sh
$ sudo apt install libpng-dev zlib1g-dev libpoppler-glib-dev libpoppler-private-dev
#+end_src
Expand Down
117 changes: 97 additions & 20 deletions lisp/pdf-annot.el
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,8 @@ Implement and describe basic org example."
(highlight (color . "yellow"))
(squiggly (color . "orange"))
(strike-out(color . "red"))
(underline (color . "blue")))
(underline (color . "blue"))
(line (color . "black")))
"An alist of initial properties for new annotations.

The alist contains a sub-alist for each of the currently available
Expand Down Expand Up @@ -135,6 +136,12 @@ to \"Joe\"."
(cons :tag "Squiggly Annotations" (const squiggly) ,markup-properties)
(cons :tag "Strike-out Annotations" (const strike-out) ,markup-properties))))

(defcustom pdf-annot-keyboard-annot-default-type "highlight"
"Default annotation type for keyboard annotation"
:type '(string)
:options '("highlight" "squiggly" "strike-out" "underline")
:group 'pdf-annot)

(defcustom pdf-annot-print-annotation-functions
'(pdf-annot-print-annotation-latex-maybe)
"A alist of functions for printing annotations, e.g. for the tooltip.
Expand Down Expand Up @@ -440,17 +447,18 @@ PAGES defaults to all pages, TYPES to all types and BUFFER to the
current buffer."

(pdf-util-assert-pdf-buffer buffer)
(unless buffer
(setq buffer (current-buffer)))
(unless (listp types)
(setq types (list types)))
(with-current-buffer buffer
(let (result)
(dolist (a (pdf-info-getannots pages))
(when (or (null types)
(memq (pdf-annot-get a 'type) types))
(push (pdf-annot-create a) result)))
result)))
(when (string= (file-name-extension buffer-file-name) "pdf")
(unless buffer
(setq buffer (current-buffer)))
(unless (listp types)
(setq types (list types)))
(with-current-buffer buffer
(let (result)
(dolist (a (pdf-info-getannots pages))
(when (or (null types)
(memq (pdf-annot-get a 'type) types))
(push (pdf-annot-create a) result)))
result))))

(defun pdf-annot-getannot (id &optional buffer)
"Return the annotation object for annotation ID.
Expand Down Expand Up @@ -1055,14 +1063,14 @@ Return the new annotation."
(error "Edges argument should be a single edge-list for text annotations"))
(let* ((a (apply #'pdf-info-addannot
page
(if (eq type 'text)
(car edges)
(apply #'pdf-util-edges-union
(apply #'append
(mapcar
(lambda (e)
(pdf-info-getselection page e))
edges))))
(pcase type
((or 'text 'line) (car edges))
(_ (apply #'pdf-util-edges-union
(apply #'append
(mapcar
(lambda (e)
(pdf-info-getselection page e))
edges)))))
type
nil
(if (not (eq type 'text)) edges)))
Expand Down Expand Up @@ -1229,6 +1237,75 @@ properties. See also `pdf-annot-add-markup-annotation'."
(interactive (list (pdf-view-active-region t)))
(pdf-annot-add-markup-annotation list-of-edges 'highlight color property-alist))

(defun pdf-annot-add-line-markup-annotation (list-of-edges
&optional color property-alist)
"Add a new highlight annotation in the selected window.

See also `pdf-annot-add-markup-annotation'."
(interactive (list (pdf-view-active-region t)))
(pdf-annot-add-markup-annotation list-of-edges 'line color property-alist))

(defun pdf-annot-keyboard-annot-format-collection (search-results)
"Transform SEARCH-RESULTS into useful collection.
The collection is given to completing-read in the
`pdf-annot-keyboard-annotate' function."
(mapcar (lambda (x)
(let ((y (cdr x)))
(cons (cdar y)
(cdr y))))
search-results))

(defun pdf-annot-keyboard-annotate (&optional arg)
"Create markup annotation using the keyboard.

Prompts for start pattern, can be only beginning part of a word
or can be multiple words, and end pattern, can be only ending
part of a word or multiple words, for the text region to
annotate. Creates type of `pdf-annot-keyboard-annot-default-type'
by default. When prefixed with universal argument
\[universal-argument], the command additionally prompts for
selecting an annotation type.

Unfortunately, in some documents the edges (i.e. size of the
region) are not translated correctly"
(interactive "P")
(pdf-tools-assert-pdf-buffer)
(let* ((from (pdf-annot-keyboard-annot-format-collection
(pdf-info-search-string (read-string "From: ")
(pdf-view-current-page))))
(to (let ((patt (read-string "To: ")))
(unless (string= patt "")
(pdf-annot-keyboard-annot-format-collection
(pdf-info-search-string patt
(pdf-view-current-page))))))
(start-coords (if (= (length from) 1)
(cadr (cadar from))
(cadar (alist-get
(completing-read "Select correct START context: " from)
from nil nil 'equal))))
(end-coords (when to
(if (= (length to) 1)
(cadr (cadar to))
(cadar (alist-get
(completing-read "Select correct END context: " to)
to nil nil 'equal)))))
(edges (if to
(append (cl-subseq start-coords 0 2) (cl-subseq end-coords 2 4))
start-coords)))
(pcase (if arg
(read-answer "Create annotation of markup type? "
'(("highlight" ?h "perform the action")
("squiggly" ?s "skip to the next")
("strike-out" ?o "accept all remaining without more questions")
("underline" ?u "accept all remaining without more questions")
("help" ?h "show help")
("quit" ?q "exit")))
pdf-annot-keyboard-annot-default-type)
("highlight" (pdf-annot-add-highlight-markup-annotation edges))
("squiggly" (pdf-annot-add-squiggly-markup-annotation edges))
("strike-out" (pdf-annot-add-strikeout-markup-annotation edges))
("underline" (pdf-annot-add-underline-markup-annotation edges)))))

(defun pdf-annot-read-color (&optional prompt)
"Read and return a color using PROMPT.

Expand Down
Loading