Pros and Cons for more modularisation #1534

nicholascar · 2020-05-01T03:44:26Z

nicholascar
May 1, 2020
Maintainer

Refactoring the RDFlib package into a core and a series of satellite repositories has been proposed for a long time (see Issue #391) and it was considered a possible goal of RDFlib's 6.0.0 release.

In order to proceed, we need at least some pros and cons of such as move, as well as a refactoring suggestions (what stays in core, what moves out) and this Issue is to document that.

Please add comments to this starting with PRO: or CON: and your suggestions for what should be in or out of core, with reference to the current repo structure.

tgbugs · 2020-05-01T04:41:46Z

tgbugs
May 1, 2020

CON: More packages means more work for maintainers. In my experience time cost is essentially constant per package per release, so if you split a single package into 10 you have 10x the work that you have to dedicate to managing releases. This is the general boilerplate overhead.

CON: Could increase the number of dependencies and complicate the process needed to get the right set of optional dependencies for some desired functionality.

Can we get more modularization internally within the rdflib repository without adding the maintenance overhead?

What release cycles currently suffer because of the current system?

Are there clear examples where some code could be used by another project? If the code is only used by rdflib then it seems like there is only a cost.

I'll add some pros at some point.

0 replies

niklasl · 2020-05-01T09:49:48Z

niklasl
May 1, 2020
Maintainer

PRO: If a feature depends on a third-party library, that burdens anyone who only needs to use the core features. As it stands, I doubt we have much of that though. If e.g. RDFa parsing depends on html5lib though, that's a case. One can conceive of future cases though, like efficient stream parsers or asynchronous handling of networking.

(I do think it's better for RDFLib to err on the side of pragmatic though, i.e. a bit heavier on features and dependencies is reasonably good enough. Still, third-party dependency issues have historically and will possibly continue to crop up occasionally.)

4 replies

aucampia Jan 17, 2022
Maintainer

One option to manage this is to just build out RDFLib in a way where these things can be controlled with extras, e.g. pip install rdflib[html] results in everything you need to make RDFa work and if you try use RDFa without having the dependencies you get an error informing you that you need to install with html extras.

nicholascar Jan 17, 2022
Maintainer Author

That’s exactly how RDFlib did do things in the past! There were several rdflib[OPTION] pip install options, but we’ve included many of them now, like JSON-LD.

I much prefer monolithic now, when modules are at least reasonably stable: so much less maintenance effort. Since JSON-LD’s inclusion in the core, all the JSON-LD function have been tested so many more times than they otherwise would have due to their inclusion in round tripping and so on.

It’s certainly ok to leave modules out when they are in early stage dev, but we should include them when mature, unless they greatly increase dependencies or something like that.

aucampia Jan 17, 2022
Maintainer

I also think that monolithic makes it easier to maintain and keep everything working, it does add some overhead, but it is a lot less than maintaining a separate repo.

If people are not okay with using extras then one other option is maybe a monorepo from which we publish several packages, but not sure how well this will work and I think it is only worth investigating if there is some concrete problem to address.

mwatts15 Jan 19, 2022

My core issue with tying diverse features into one package is that it ties my consideration of compatibility with the core to whether or not I need a feature. It increases my overhead when there are new releases that either have no changes in the core, or have only fixes in the core: I'm still obliged to read through all the release notes and integration test in case there is a change that affects my package. OTOH, what if I wanted to use something that could be a separate distributed pluggable feature, but it's only available if I upgrade? I may not want to upgrade since it would break my package. You may say, "that's just the cost of getting the new feature," but that would only be because the choice was made not to let that feature have its own release cycle, so it doesn't have to be that way.

rduivenvoorde · 2020-10-11T08:10:26Z

rduivenvoorde
Oct 11, 2020

@tgbugs as an example rdflib being used by a UI-based plugin for QGIS (a GIS desktop application): https://github.com/sparqlunicorn/sparqlunicornGoesGIS
Being tied to urllib2 is in this case a CON, as QGIS is Qt based and it is better to use Qt based classes then for network requests as then settings/debug etc of QGIS can be used (see: sparqlunicorn/sparqlunicornGoesGIS#18 (comment)).

As a visuable example: the 2nd WorldWar bunker in The Netherlands queried via the plugin/rdflib and shown in QGIS:

So speaking as QGIS community member: it would be nice to at least make the networking requests modular (maybe using some abstraction layer like:
https://github.com/planetfederal/lib-qgis-commons/blob/master/qgiscommons2/network/networkaccessmanager.py

I'm not sure if this is feasible or not...

0 replies

nicholascar · 2020-10-11T23:40:07Z

nicholascar
Oct 11, 2020
Maintainer Author

@rduivenvoorde can you please try running your QGIC plugin with RDFlib master, as in built from the master branch of this repo? Since we've removed support for Python 2, I don't think we are tied to urllib2 anywhere and we've also just removed the dependency on the requests module too, so it and it would be good to know if these changes positively affect your library.

0 replies

ashleysommer · 2020-10-12T02:56:10Z

ashleysommer
Oct 12, 2020
Maintainer

The dependency on urllib2 is not directly from RDFLib (if you look at the code, you'll see no usages for urllib2 and urllib2 is not in our list of install requirements).
@nicholascar is right, it comes from the use of the requests library. We have recently removed the dependency on requests in favor of the newer built-in urllib.request Python3 module, so if you test on master you will see that urllib2 is no longer brought in.

0 replies

situx · 2020-10-29T23:08:40Z

situx
Oct 29, 2020

Hello, as the main developer of the QGIS plugin in question I might shed some light on the details of the aforementioned request.

I am using RDFLib 4.2.2 with SPARQLWrapper bundled with the QGIS plugin, as libraries not bundled with QGIS need to be added to QGIS plugins directly
The plugin queries a SPARQL Endpoint via SPARQLWrapper and as far as I can see I am able to configure whether SPARQLWrapper uses a POST or GET request, but I am not able to delegate the execution of this request to a different implementation such as QgsNetworkAccessManager (https://qgis.org/pyqgis/3.4/core/QgsNetworkAccessManager.html)
So I assume SPARQLWrapper uses urllib to do the HTTP request to the SPARQL endpoint.

Is this wrong or is there indeed some way I could use QgsNetworkAccessManager instead?

0 replies

nicholascar · 2020-10-30T10:41:47Z

nicholascar
Oct 30, 2020
Maintainer Author

Hi Timo,

I assume SPARQLWrapper uses urllib to do the HTTP request to the SPARQL endpoint

Yes it does. Both SPARQLWrapper & RDFlib, if doing web requests, use urllib without the option to delegate.

is there indeed some way I could use QgsNetworkAccessManager instead?

Not directly within SPARQLWrapper but: SPARQLWrapper was written a very long time ago before modern HTTP Python libraries like requests or httpx. I would suggest that to delegate network functions, you keep using RDFlib but just write your SPARQL requests into a requests or httpx-based script and take advantage of their better networking capabilities. There aren't so many benefits that you get from SPARQLWrapper over a few requests/httpx lines of code. I haven't looked into this myself buy presumably QgsNetworkAccessManager can be used by things like requests/httpx.

0 replies

situx · 2020-10-30T17:22:58Z

situx
Oct 30, 2020

Hi Nicolas,

I think it is probably not so easy. I believe QgsNetworkAccessManager will use Qt-builtin network handling.
The issue is rather not to use requests or httpx, because we would like to take advantage of internal QGIS proxy settings which are stored as a QSettings object.
My current workaround for this is to check for QGIS Settings and then apply them in urllib2 for SPARQLWrapper, but it appears that this is not sufficient. I probably have to write some new code using Rdflib and QgsNetworkManager skipping SPARQLWrapper to finally solve this issue.
Probably needs some more investigation.

0 replies

nicholascar · 2020-10-30T22:25:29Z

nicholascar
Oct 30, 2020
Maintainer Author

I probably have to write some new code using Rdflib and QgsNetworkManager skipping SPARQLWrapper

This is what I meant! Skip SPARQLWrapper altogether and write your own code on top of RDFlib. If you use requests/httpx for the network calls - and they really should be able to use any proxy settings - you won’t be writing much more code than SPARQLWrapper anyway!

(I do think that eventually we will need to update SPARQLWrapper to use all the modern internet connection options)

0 replies

mwatts15 · 2021-12-27T20:41:36Z

mwatts15
Dec 27, 2021

(Not much discussion of pros and cons in here.)

PRO: I just want to say that I'm for splitting off an rdflib-core module. Several features have been added that I think only serve certain subsets of folks who use RDFLib (e.g., JSON-LD parsing, "hextuples" serialization, "longturtle"), but if I'm using RDFLib exclusive of those features, I still have the extra bytes on my machine, the additional modules loaded, and more stuff I have to consider when I'm upgrading. My preference is a minimal core sufficient for coordination and reuse while all but a very few plugin implementations are put either kept in a more inclusive module (I've seen suggestions that it remains named "rdflib") or in separate modules for particularly niche features. I think I'm basically advocating for what @gromgull described here: #391 (comment)

As far as what should be in rdflib-core, I'd say: whatever we've got now minus things that can be brought in as plugins.

Side note on one argument against further modularization is that it would be more overhead: I think it can be manageable. You would have as many setup.py scripts (and some supporting configs) as projects, but they would be simple, not change very often, and the only coordination I foresee between them is if rdflib-core splits off a feature that one of the others uses. rdflib-core and the other modules can remain in the same repository while still distributing separately.

1 reply

ghost Dec 28, 2021

(Not much discussion of pros and cons in here.)

TBF, I've only recently enabled the Discussions feature, so people haven't had much chance to pitch in yet. I hope that the explicit “Discussions” label will be a little more inviting and the discussions themselves more accessible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pros and Cons for more modularisation #1534

{{title}}

Replies: 10 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Pros and Cons for more modularisation #1534

nicholascar May 1, 2020 Maintainer

Replies: 10 comments · 5 replies

tgbugs May 1, 2020

niklasl May 1, 2020 Maintainer

aucampia Jan 17, 2022 Maintainer

nicholascar Jan 17, 2022 Maintainer Author

aucampia Jan 17, 2022 Maintainer

mwatts15 Jan 19, 2022

rduivenvoorde Oct 11, 2020

nicholascar Oct 11, 2020 Maintainer Author

ashleysommer Oct 12, 2020 Maintainer

situx Oct 29, 2020

nicholascar Oct 30, 2020 Maintainer Author

situx Oct 30, 2020

nicholascar Oct 30, 2020 Maintainer Author

mwatts15 Dec 27, 2021

ghost Dec 28, 2021

nicholascar
May 1, 2020
Maintainer

Replies: 10 comments 5 replies

tgbugs
May 1, 2020

niklasl
May 1, 2020
Maintainer

aucampia Jan 17, 2022
Maintainer

nicholascar Jan 17, 2022
Maintainer Author

aucampia Jan 17, 2022
Maintainer

rduivenvoorde
Oct 11, 2020

nicholascar
Oct 11, 2020
Maintainer Author

ashleysommer
Oct 12, 2020
Maintainer

situx
Oct 29, 2020

nicholascar
Oct 30, 2020
Maintainer Author

situx
Oct 30, 2020

nicholascar
Oct 30, 2020
Maintainer Author

mwatts15
Dec 27, 2021