Pros and Cons for more modularisation #1534
Replies: 10 comments 5 replies
-
CON: More packages means more work for maintainers. In my experience time cost is essentially constant per package per release, so if you split a single package into 10 you have 10x the work that you have to dedicate to managing releases. This is the general boilerplate overhead. CON: Could increase the number of dependencies and complicate the process needed to get the right set of optional dependencies for some desired functionality. Can we get more modularization internally within the rdflib repository without adding the maintenance overhead? What release cycles currently suffer because of the current system? Are there clear examples where some code could be used by another project? If the code is only used by rdflib then it seems like there is only a cost. I'll add some pros at some point. |
Beta Was this translation helpful? Give feedback.
-
PRO: If a feature depends on a third-party library, that burdens anyone who only needs to use the core features. As it stands, I doubt we have much of that though. If e.g. RDFa parsing depends on html5lib though, that's a case. One can conceive of future cases though, like efficient stream parsers or asynchronous handling of networking. (I do think it's better for RDFLib to err on the side of pragmatic though, i.e. a bit heavier on features and dependencies is reasonably good enough. Still, third-party dependency issues have historically and will possibly continue to crop up occasionally.) |
Beta Was this translation helpful? Give feedback.
-
@tgbugs as an example rdflib being used by a UI-based plugin for QGIS (a GIS desktop application): https://github.com/sparqlunicorn/sparqlunicornGoesGIS As a visuable example: the 2nd WorldWar bunker in The Netherlands queried via the plugin/rdflib and shown in QGIS: So speaking as QGIS community member: it would be nice to at least make the networking requests modular (maybe using some abstraction layer like: I'm not sure if this is feasible or not... |
Beta Was this translation helpful? Give feedback.
-
@rduivenvoorde can you please try running your QGIC plugin with RDFlib |
Beta Was this translation helpful? Give feedback.
-
The dependency on |
Beta Was this translation helpful? Give feedback.
-
Hello, as the main developer of the QGIS plugin in question I might shed some light on the details of the aforementioned request.
Is this wrong or is there indeed some way I could use QgsNetworkAccessManager instead? |
Beta Was this translation helpful? Give feedback.
-
Hi Timo,
Yes it does. Both SPARQLWrapper & RDFlib, if doing web requests, use urllib without the option to delegate.
Not directly within SPARQLWrapper but: SPARQLWrapper was written a very long time ago before modern HTTP Python libraries like requests or httpx. I would suggest that to delegate network functions, you keep using RDFlib but just write your SPARQL requests into a requests or httpx-based script and take advantage of their better networking capabilities. There aren't so many benefits that you get from SPARQLWrapper over a few requests/httpx lines of code. I haven't looked into this myself buy presumably QgsNetworkAccessManager can be used by things like requests/httpx. |
Beta Was this translation helpful? Give feedback.
-
Hi Nicolas, I think it is probably not so easy. I believe QgsNetworkAccessManager will use Qt-builtin network handling. |
Beta Was this translation helpful? Give feedback.
-
This is what I meant! Skip SPARQLWrapper altogether and write your own code on top of RDFlib. If you use requests/httpx for the network calls - and they really should be able to use any proxy settings - you won’t be writing much more code than SPARQLWrapper anyway! (I do think that eventually we will need to update SPARQLWrapper to use all the modern internet connection options) |
Beta Was this translation helpful? Give feedback.
-
(Not much discussion of pros and cons in here.) PRO: I just want to say that I'm for splitting off an rdflib-core module. Several features have been added that I think only serve certain subsets of folks who use RDFLib (e.g., JSON-LD parsing, "hextuples" serialization, "longturtle"), but if I'm using RDFLib exclusive of those features, I still have the extra bytes on my machine, the additional modules loaded, and more stuff I have to consider when I'm upgrading. My preference is a minimal core sufficient for coordination and reuse while all but a very few plugin implementations are put either kept in a more inclusive module (I've seen suggestions that it remains named "rdflib") or in separate modules for particularly niche features. I think I'm basically advocating for what @gromgull described here: #391 (comment) As far as what should be in rdflib-core, I'd say: whatever we've got now minus things that can be brought in as plugins. Side note on one argument against further modularization is that it would be more overhead: I think it can be manageable. You would have as many setup.py scripts (and some supporting configs) as projects, but they would be simple, not change very often, and the only coordination I foresee between them is if rdflib-core splits off a feature that one of the others uses. rdflib-core and the other modules can remain in the same repository while still distributing separately. |
Beta Was this translation helpful? Give feedback.
-
Refactoring the RDFlib package into a core and a series of satellite repositories has been proposed for a long time (see Issue #391) and it was considered a possible goal of RDFlib's 6.0.0 release.
In order to proceed, we need at least some pros and cons of such as move, as well as a refactoring suggestions (what stays in core, what moves out) and this Issue is to document that.
Please add comments to this starting with PRO: or CON: and your suggestions for what should be in or out of core, with reference to the current repo structure.
Beta Was this translation helpful? Give feedback.
All reactions