Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pymupdf backend #61

Open
wants to merge 68 commits into
base: master
Choose a base branch
from
Open

Conversation

dalanicolai
Copy link
Contributor

I am opening this PR to inform you about my work on a PyMuPDF backend (via a server in the file vimura.py). Because this server is written in python it makes pdf-tools much more hackable(, extendable and maintainable). Using the vimura server it becomes much easier to implement new features like line/arrow and ink/freehand annotations and if desired functionality to support pdf forms. (A very basic line annotation feature has already been implemented. Also, search functionality has been implemented, which appears to be about twice as fast as when using the epdfinfo backend).

Because it 'is powered by' the excellent and very well documented PyMuPDF library, any person that knows a little python can write code for the server.

As it is quite some work to implement the full set of pdf-tools features myself (and it has already cost me quite some time to investigate how it all works), I thought I'll create this PR and see if some enthusiastic hackers are interested to implement more features/enhance the python code (I am no great python programmer myself, and unfortunately can not afford to work a lot on this project).

Let's see what happens, anyway the basic server/code is now available...

Unless you are fine with 'managing' the development, the development can probably better take place at my fork. But I would be perfectly happy if development took place here also.

Currently, the process-connection-type is set explicitly, which for some reason
breaks the tq communication (no value gets printed to the tq buffer).

While trying to find the cause of the issue mentioned above (which took me at
least 5 hours), I had also commented out the defadvice. As everything seems to
work okay, I will leave it commented out for now.
This commit, which should have been separated in two commits, fixes the
annotation functionality.

Additionally it implements a working test of line annotations, it can be used by
using `pdf-view-mouse-set-region-rectangle`. Although it DOES NOT YEST SHOW
ACTIVATION OF THE AREA, it does work when calling
`pdf-annot-add-line-markup-annotation` after it.
@dalanicolai
Copy link
Contributor Author

dalanicolai commented Dec 17, 2021

I notice now that my keyboard annotation of #29, accidentally, is also in this PR. But I will just leave it like that as I am no git guru and it does not really do any harm (especially as I do not expect this code to get merged any time soon, the PR is really just to inform you/users about it).

Remaining in the set of currently supported annotations. Support for more types
shouldn't be too difficult (see the already implemented rudimentary line
annotation feature)
@vedang
Copy link
Owner

vedang commented Dec 19, 2021

Hi @dalanicolai ,

This is very cool! I will review the code and test it myself sometime this week, excited to see what this can unlock.

What do you think about maintaining pymupdf as a second backend that ships with pdf-tools until acceptable feature parity? That is, I'm thinking that I will merge these changes into master but will keep epdfinfo as the default backend. This will allow interested folks to test / hack the pymupdf backend without impacting the experience of other users. It will also allow for smaller PRs that are easier to review, instead of you regularly pushing more and more code to this PR.

I have kept aside some time from today to the end of the year to clean up pdf-tools and cut a new release. Once this is complete, I will rebase and review this change and will get back to you on this PR.

@dalanicolai
Copy link
Contributor Author

Thanks for the enthusiastic reply!

Your proposal is more or less what I had in mind. Of course, I agree that we should keep the very stable epdfinfo server the default server. There is a simple pdf-tools-toggle-server command to toggle between the backends.

I do think that the PyMuPDF backend will very soon (probably today) surpass the epdfinfo server w.r.t. annotation features. But I am not sure if anybody will find it worth it to 'port/rewrite' some advanced features like opening encrypted documents, or opening pdf's from a remote location, so I don't expect the PyMuPDF server to ever be a good replacement for the epdfinfo server (it is more a great 'extension'. While it also makes transparent how the pdf-tools server and communication with it works).

Indeed we can simply ship the single python file with pdf-tools, and inform readers that they could use the 'vimura' server for extended annotation support (and possibly pdf forms), by installing the required python packages. We could also create a simple elisp command to install the required python packages, but that is of low priority as users can easy enough do it manually.

I wish you good luck with the new release and good times at this ending of the year...

This commit implements correctly working line annotations. The functionality is
still a little cumbersome, lines can be 'previewed' using `C-M-down-mouse-1`,
and subsequently be created using `M-x pdf-annot-add-line-markup-annotation`.

The second step should get merged into the first step.
@dalanicolai
Copy link
Contributor Author

dalanicolai commented Dec 20, 2021

SECOND EDIT
Finally I have learned that /tmp actually uses volatile memory. I did not know/realize this. So I will revert things, and you can forget about this comment...

EDIT
I was a bit distracted by thinking about how to make the vimura server support both options. Actually to make both servers compatible is easy enough. So I have fixed the compatibility again.

However, the information about the method of communication is still very relevant info I would say.
END EDIT

I have temporarily 'broken' the compatibility between the two servers. The reason for this is that currently pdf-tools uses temp files to communicate image data, meaning it writes to disk for every page rendered, and writes multiple times to disk when selecting (text) regions (a new file for every new 'tracked' size of the active area marker).

This looks rather ugly to me, so I have implemented communication via base64 encode strings. It might be that it minimally decreases the responsiveness, but it is hard to tell as the difference is too tiny to really notice the difference.

Anyway, from commit 36c8e7b the rendering for the epdfinfo server does not work on 'this branch'. In the end, it is not so much work to fix the compatibility again (but it does require a little work). Also, maybe some users still prefer to use temp files (disk writes), so maybe we should make both options available (but again this requires some work).

This commit implements image data communication via base64 encoded
string for the pymupdf server (strongly reducing disk writes).

Currently pdf-tools 'communicates' image data via a tempfile, meaning every time
a page gets rendered, which happens very frequently while selecting
regions (drawing the active region).
@vedang vedang added this to the v1.1.0 milestone Jan 6, 2022
@oscarfv
Copy link

oscarfv commented Jan 8, 2022

FWIW: adding Python as a dependency is a no-no here. I hope pdf-tools keeps providing and maintaining the "old" server indefinitely.

@minad
Copy link

minad commented Apr 27, 2022

Does this mean that pdf-tools is going to switch to a python backend in the long term, because the stated goal of this PR is to lower the maintainance barrier? Or will there always be parity between the backends?

@dalanicolai
Copy link
Contributor Author

dalanicolai commented Apr 27, 2022

I do think that the PyMuPDF backend will very soon (probably today) surpass the epdfinfo server w.r.t. annotation features. But I am not sure if anybody will find it worth it to 'port/rewrite' some advanced features like opening encrypted documents, or opening pdf's from a remote location, so I don't expect the PyMuPDF server to ever be a good replacement for the epdfinfo server (it is more a great 'extension'. While it also makes transparent how the pdf-tools server and communication with it works).

Indeed we can simply ship the single python file with pdf-tools, and inform readers that they could use the 'vimura' server for extended annotation support (and possibly pdf forms), by installing the required python packages. We could also create a simple elisp command to install the required python packages, but that is of low priority as users can easy enough do it manually.

Although I don't really know C, the 'epdfinfo' server looks like high quality code to me, it works perfectly fine, and also it is faster than the python server.

The reason for writing the python server is that I know that some users are interested in the ability to add line and free text annotations (which is/are already implemented in this PR). I am quite sure that generally, if users would like to extend pdf-tools with even more features, there are more users who know how to 'hack' on the python server, than there are users that know how to hack on the C server, more features. It would be perfectly possible to turn pdf-tools into a full featured pdf editor with pdf form support etc.

But again, for most users the c-server is fine enough, and my personal (and also Vedang's, and presumably most people's) preference is to keep 'epdinfo' the default server and provide the python server as an alternative for when more features are desired.

@minad
Copy link

minad commented Apr 27, 2022

This is just my personal opinion and I am not involved with this project, except that I am a happy user. But I see a problem if multiple backends with different capabilities are added. This will likely not decrease the maintenance effort but rather increase it. Also if you have multiple backends with different feature sets one of the backends will likely become a second class backend since it will see less and less maintainance, or even worse we will end up with two backends which excel for different use cases, e.g., a more performant epdfinfo backend and a more featureful python backend. The other alternative is that all the enhanced features from the python backend are ported over to the C backend in the long term but then there is no reason to keep the python backend. Overall I don't understand the motivation besides that the Python backend is more hackable and since you are more familiar with it. As such it could function as a prototype implementation to experiment with new features, but this doesn't seem like a justified reason for the addition, in particular if as you say, most users are happy with the existing backend. Given that pdf-tools is a widely used and mature project I would prefer if the development continues in smooth and steady pace, fixing and improving things one by one instead of throwing out or replacing large and significant components. Maybe @vedang can comment on the planned direction of the project?

@dalanicolai
Copy link
Contributor Author

I agree with most of what you say (e.g. it would be great if someone would extend the C server, although I do not expect anybody willing to do that soon). The only point where I do not fully agree is that if most users are happy with the current backend, there would be no reason to add extra features for users who would like to have them.

For me personally, the arguments for merging this is that there might be more assistance in the maintenance, and that I would expect more feedback/reviews on the code here (from which I would learn).

An argument for keeping it in a separate repo, would be that I do not have to 'watch' the pdf-tools repo for feedback/bugs on the python server. But, the drawback of having a separate repo is that I would expect fewer (or no) people being interested to review/improve the code (so that I don't learn, and the code would not improve in quality).

Also, the backend could not get added in the master branch, and be kept only in some 'experimental' branch, but I guess it is quite to be expected that very few users would even try to use such experimental branch.

So, I guess the choice is up to @vedang, if he would like to have the 'prototype' server within the codebase or not. For me, it would be completely fine to provide it as a separate 'extension' package, so that there is zero extra maintenance here. This PR is mainly to inform you and get your opinions/perspective, and indeed discussing things before merging it is the right thing to do.

Anyway, indeed, if the python backend gets merged, then it would probably be wise to consider it and explicitly name it a prototype/experimental backend, explaining that it is the 'second class' but more hackable and more featured (w.r.t. annotations) backend.

@minad
Copy link

minad commented Apr 28, 2022

I understand your motivation and I agree that it would help if you want more people to look at your python backend. But let's be clear: This will not reduce maintainance for the pdf-tools project itself. Essentially what you are proposing a fork or a significant deviation from the original development direction of the pdf-tools project.

I am not opposed to a switch in language of the backend per se, e.g., a switch from C to Rust or Zig or something more hackable (and safer, but still fast) in the long term. But then I still argue that there are many more C hackers around and the Emacs code base is written largely in C. Therefore I think your argument that there are not enough C contributors around is not true. It could be a good opportunity to learn C if there is an actual goal to change something. Python and C go also hand in hand, so there is a lot of synergy.

A language switch to Python on the other hand feels wrong. Python is much slower and comes with a whole bunch of dependencies. pdf-tools already has a slow part, namely the Elisp side. Using Python is like pushing in an additional unnecessary slow layer between the native pdf library and Emacs. I think for Emacs the common denominator are Elisp and C and not Python.

A bit off-topic, but I am always critical if proposals are made to extend Emacs in different languages (just because of familiarity), e.g., JS, Lua or Python. A lot will be lost if one would do that. Emacs profits from being coherently written mostly in Elisp. For example I don't have the Python stack and libraries you are using here necessarily around. Elisp may not feel as good, if one is developer of mainly Python, JS or Rust - it would feel more natural to go with my favorite language to also extend Emacs. But it would not be the common denominator, that is Elisp! As another example, yesterday I stumbled over this comment by Jonas: melpa/melpa#7999 (comment), where Jonas wondered about Python-like indentation in certain Emacs packages. Indentation is a minor issue of course, but it still doesn't feel right to try to treat Elisp as something else than it is.

Since pdf-tools is already used by a large user base, I hope that the project continues to be maintained in a direction which follows the original goals of the project. I think many people appreciate that pdf-tools is quite fast and lightweight with regards to its dependency footprint. This is something I also follow in my projects. Of course this doesn't mean that everyone has to agree with me on that. As you say, there is a lack of features in pdf-tools and there is a lot of room for improvement and feature development which would certainly be facilitated by a switch to a python backend. I agree with you that it would be nice to have more features!

I think it would even make more sense then to throw out the native backend completely and to go all in with the python backend. This will be a saner development direction since you don't have to juggle with incompatibilities of the backends, more complicated testing and so on. But to bring me back to my first point - such a project will not pdf-tools in its current form anymore. It will be a different project. My question is then why not develop this separately, let's say in a pymupdf-tools project as a completely separate fork? Otherwise maybe we are going to see a fork anyways which follows more closely in the steps of the original project.

I hope my comment does not come over as overly negative. I know I am sounding very critical like some concrete head. I appreciate the work you are doing, thinking about how the project could be improved. I am also not telling you to change anything. That's the beautiful thing about free software, you can hack on what you like. If @vedang and you decide that a Python backend is the best way forward this is totally okay. Other users may then decide that they rather continue to use the old implementation or fork it and be happy with a reduced feature set, which would also be okay.

@dalanicolai
Copy link
Contributor Author

I get your points, and mostly agree with them.

I appreciate your comments, and from a pdf-tools perspective, I could agree that it makes sense to keep this in a separate repo. Also, regarding the possibility of pdf-tools getting part of Emacs, it might be preferable to 'leave' the python backend out.

What we could do, is to offer the server as a pip installable package, and only add some configuration options for using the python backend (similar to how doc-view offers configuration options to set which 'external' backend to use, which the user has to install manually).

@minad
Copy link

minad commented May 2, 2022

I should remark that moving out the server only does not address the problematic consequences I mentioned before. I don't consider the mere presence of python code a problem, I consider the idea to have multiple backends a problem. We will still have two servers with two different feature sets, supporting both will still be difficult for maintenance. Furthermore due to the different feature sets, one server will still be first class and the other one will get more and more neglected over time. The other possibility is that features are ported back and forth between the backends, but if that happens the whole separate backend business is pointless.

To make my point very explicit - I think the best approach is to make this a completely separate project (both python and elisp) or decide on a single backend going forward (either native or python).

@dalanicolai
Copy link
Contributor Author

dalanicolai commented May 2, 2022

Well, here we disagree. There is nothing to maintain for the extra server, except for the different command to start it up. Then further, the communication protocol/api is identical to that of the C server. It is comparable to having different language servers available for LSP; the LSP client is implemented once in Emacs, then all servers that would like to use it should adhere to the protocol. The only thing required to add an extra server is adding a configuration file. If one of the servers does not support some feature, well, then that feature is not available for that server.

Personally, I think it is great to simply be able to do M-x pdf-toggle-server and then use a different server. Otherwise, 99.9% of the pdf-tools code works for both servers. By removing the python server, it is clear that the C server is the default and the only 'officially' maintained server. But that is no reason to not add the simple toggle command once for supporting (the) alternative servers.
B.t.w. as far as I have observed, except for some minor modifications to a total of about 10 lines, there has not been any work on the C server for the last 5 years. And furthermore, in the 10 years of the server its existence, more or less only 1 other contributor, has only once contributed a function of about 20 lines to the server code.

The major part of the code added in this PR is for implementing line and arrow annotations in the lisp side of the code. If someone simply adds those features to the C server, then it is automatically available also to the C server.

Furthermore, I accidentally included a keyboard annotation feature in the code here. Maybe I should create separate PR's for adding the support for some extra server, and for extending the 'client' with extra annotation features. And maybe even create a third one for the keyboard annotation feature. But I was a bit 'overworked' already...

@minad
Copy link

minad commented May 2, 2022

It is comparable to having different language servers available for LSP; the LSP client is implemented once in Emacs, then all servers that would like to use it should adhere to the protocol.

No, it is not comparable. Lsp-servers support different languages, while in this case you are just adding a different implementation since you prefer to hack on a python backend in contrast to hacking on a C backend. It is understandable that you prefer hacking on python, but there is NO advantage in adding multiple backends per se.

The major part of the code added in this PR is for implementing line and arrow annotations in the lisp side of the code. If someone simply adds those features to the C server, then it is automatically available also to the C server.

Yes, this is something I am not happy about. Basically you are moving this project in a direction where it only supports all features if users chose a different server. You are also creating maintainance effort for client side features which are not available in the official server.

B.t.w. as far as I have observed, except for some minor modifications to a total of about 10 lines, there has not been any work on the C server for the last 5 years. And furthermore, in the 10 years of the server its existence, more or less only 1 other contributor, has only once contributed a function of about 20 lines to the server code.

Maybe because it works well enough? Did you try adding the features you want to that server?

@titaniumbones
Copy link
Contributor

titaniumbones commented May 2, 2022 via email

@minad
Copy link

minad commented May 2, 2022

@titaniumbones Obviously I don't have a good answer to that. Someone has to join the efforts and work on new features. It astonishes me that C knowledge is supposedly an issue. I could certainly implement features in C, but as you are probably aware I maintain multiple Elisp projects which already eat up much of my free software contribution time. I am generally critical of the idea of rewriting something because of lack of familiarity with a language. Furthermore C is not Cobol or PL/I. In such cases I would understand why the backend has to be replaced.

I consider Python a particularly bad choice since it brings in many dependencies and due to its performance issues. Many people know Python but probably more people know C (I mean well enough, given that everyone learns Python these days). Python is not a language which so far exists in the Emacs code base (in contrast to C and Elisp). Pymupdf is an unnecessary middleman sitting between the native mupdf library. Maybe poppler is the actual problem and replacing poppler with mupdf in the native backend would be the solution to achieve better hackability?

Maybe by implementing the features in a separate Python backend (like a prototype) it will become easier to also port the new features over to C. But then I see the problem since the underlying PDF libraries are different. So not even the argument that the Python backend takes the role of a prototype holds up. In the end what we will get are diverging servers based on different renderers with incompatible feature sets which cannot be ported back and forth.

Anyway if everyone is convinced that moving to another backend is the best way forward, so be it. I am not convinced, but I am also not in a deliberate need of additional features. I do think it is a fair question to ask if we really need these features and if most of the users would be happy with accepting a move to another server. It is not that we could not get these features with the C backend due to technical reasons or is it? Personally I would rather like a consolidation of existing features, bug fixes and so on, while keeping the existing efficient backend which incurs a smaller dependency footprint. I had hoped that such a development direction would be followed when the new maintainership took over the project. Maybe the native backend was always considered a nuisance?

Let me emphasize that my comments should not be seen as a personal attack towards @dalanicolai, I hope they are not! I try to purely question the technical considerations here. In contrast to you guys I am also putting less weight on the social factors. Are there enough contributors, who has a motivation to contribute, etc. I acknowledge that these considerations are important, but I think sticking to some existing developments is also important.

@astoff
Copy link
Contributor

astoff commented Feb 11, 2023

Daniel M, can you see a path forward where the new features Daniel N. is trying to add might be implemented in the C server?

I think the answer to this question is "sadly, no", since Poppler doesn't support Ink and FreeText annotations if I read the docs correctly. One would have to rewrite the server based on mupdf.

However, Poppler seems to support at least Line, Square and Circle which could be added to epdfinfo. Related discussion: #191.

@minad
Copy link

minad commented Feb 11, 2023

@astoff How does this make the answer "sadly, no"? See what I wrote in #61 (comment), and you also said it yourself.

Pymupdf is an unnecessary middleman sitting between the native mupdf library. Maybe poppler is the actual problem and replacing poppler with mupdf in the native backend would be the solution to achieve better hackability?

One can replace Poppler with mupdf in epdfinfo.

@astoff
Copy link
Contributor

astoff commented Feb 11, 2023

I wouldn't call this hypothetical mupdf-based backend the C server, I would call it a new C server. Anyway the point is that the new features mentioned here apparently are not possible with the current version of Poppler.

@minad
Copy link

minad commented Feb 11, 2023

@astoff

I wouldn't call this hypothetical mupdf-based backend the C server, I would call it a new C server.

I think that's hair splitting. In any case it makes a difference if you introduce a "new" or modified C server or an entirely different Python server.

Anyway the point is that the new features mentioned here apparently are not possible with the current version of Poppler.

No, that's not the point. This PR proposes a new Python server, for reasons which I don't agree with. I am with you that Poppler may be insufficient, but one could as well replace/extend the existing C server, preserving the character of pdf-tools. Maybe you already use pymupdf, so it is perfectly fine to bring in that dependency and middleman. But for some users it is undesired.

Of course there are many more Python and JavaScript developers than Elisp developers. Does this mean that we now have to split every Emacs package into an Elisp frontend and a Python/JavaScript backend?

@astoff
Copy link
Contributor

astoff commented Feb 11, 2023

By the way, I just saw this issue linked above which indicates that some of the missing Poppler features might actually already be implemented and are just missing in the glib API.

(My intention here was not to comment on the merits of a Python server, but rather on the capabilites of the two PDF libraries.)

@dalanicolai
Copy link
Contributor Author

@astoff Thanks for the link of course. However, I do see 'freetext' annotation type mentioned in those docs. Also, I think Poppler indeed does support 'Ink' annotations (if it means 'freehand' annotations). Okular also uses Poppler, and it supports both Ink and Freetext annotations (I guess both via poppler).

@astoff
Copy link
Contributor

astoff commented Feb 13, 2023

However, I do see 'freetext' annotation type mentioned in those docs.

Yes, but I don't see a method to create one (presumably you can examine such annotation if it was created in another tool). Or am I missing something?

Okular also uses Poppler, and it supports both Ink and Freetext annotations (I guess both via poppler).

As I just learned from the issue linked in my last message, the glib API is lagging behind the Qt API, which is at the same time a shame and an opportunity since the work to add it to glib Poppler should be minimal at this point.

@astoff
Copy link
Contributor

astoff commented Feb 13, 2023

By the way, from the lag I see when marking text in pdf-tools (good old C backend), I'm skeptical the Ink type annotations could work well. Unless the drawing happens only inside Emacs and then an Ink annotation is only created for the purposes of saving it permanently in the PDF.

Which brings up a question I was asking myself: is there an easy way to draw over an image / superimpose images inside Emacs?

@minad
Copy link

minad commented Feb 13, 2023

@astoff

Which brings up a question I was asking myself: is there an easy way to draw over an image / superimpose images inside Emacs?

SVG. This is what I use in osm.el.

@astoff
Copy link
Contributor

astoff commented Feb 14, 2023

Okay, the SVG approach is really cool. Maybe it could be used here? So pdf-tools would provide some kind of hook to draw SVG elements over the page image, and then third-party libraries could provide various functions on top of that. For instance, one could use a separate pymupdf script to save drawings or text as annotations.

Also, as I said before, even if Poppler supported Ink annotations, I think it would probably be too laggy to use the backend to display the drawing as it is made (judging by the lag when marking some text, which is noticeable).

@dalanicolai
Copy link
Contributor Author

@astoff If you'd like to hack on a pdf/document reader, then you might like to have a look at doc-tools. Indeed, for the reason you are mentioning here, it renders via SVG. More info you can find in the README, and I am happy to answer any question about it.

@astoff
Copy link
Contributor

astoff commented Feb 16, 2023

This look very interesting, and quite ambitious too. Looking forward to what comes out of it :-).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature implementation This is a substantial code change and / or implements significant new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants