ds.summary simplification #257

tombisho · 2020-09-08T12:25:35Z

ds.summary was originally written to deal with simple R object types like dataframes and vectors. It tests the object type on the server side, and depending on the result, calls a serverside method that generates a privacy protected summary and returns it to the client.

There are 2 things I would like to raise about this approach:

As we add more exotic object types, the list of if{} then{} statements grows ever longer
If the object type has not been added to ds.summary and a corresponding server side function defined to summarise it, a summary is not available

I have been looking how the native summary() function works. It is written as a generic function, which just provides a placeholder. When new objects are defined, they can provide their own method for summarising the data. For example, an object of class blob could have a method called summary.blob(). When summary() is called with a blob object, the code in summary.blob() is executed. There is also a method summary.default() which looks at the underlying structure of an object - most objects are matrices, lists etc. with some extra attributes - and then summarises it. For example, if the blob object was just a matrix with some extra attributes, and we did not define a method for summarising it, when summary() is called, it falls back to summary.default(). This sees that the blob object is really just a matrix and summarises each column.

I wonder if a similar approach might be useful in DataSHIELD, to save having to explicitly define summary functions for every object type.

This would mean having a generic summaryDS() function which is analogous to the summary() function, but with privacy protection. Objects created with DataSHIELD on the serverside could also define their own version of summaryDS(), for example summaryDS.glm() for GLMs.

StuartWheater · 2020-09-08T12:35:21Z

What you describe is a common pattern when building extensible frameworks. Would it be acceptable to scheduled this change to v6.2?

tombisho · 2020-09-08T12:39:40Z

Certainly v6.2 would be fine - I don't think it is urgent but I feel it becomes more relevant the more that we simply 'wrap' existing package functionality with privacy protection.

It feels like there could be wider applications beyond just the summary function - perhaps with plotting, too?

StuartWheater · 2020-09-08T12:45:15Z

I was wondering if complicated function, for example 'predict', could also benefit from a similar approach.

tombisho · 2020-09-08T13:05:12Z

yes - I see what you mean, as it seems to be the same situation there

tombisho · 2020-11-18T16:14:29Z

I don't know the status of this in v6.2, but the current way in which ds.summary works is also causing some challenges in the ds-helper package. At the moment it seems to lack any parallelisation, in that a separate call is made to each server. So when you have a lot of studies, distributed globally, it takes a long time to run. A side point is when I have understood correctly that DataSHIELD calls to the servers are indeed made asynchronously (ie all at the same time) not synchronously (wait for each server to return before going to the next).

Anyway, this effect is amplified in the ds-helper getStats function which produces a tabular set of stats useful for papers. So we would like to address if possible

StuartWheater added the enhancement label Sep 8, 2020

StuartWheater added this to the v6.2 milestone Sep 8, 2020

StuartWheater modified the milestones: v6.2, v6.4 Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ds.summary simplification #257

ds.summary simplification #257

tombisho commented Sep 8, 2020

StuartWheater commented Sep 8, 2020

tombisho commented Sep 8, 2020

StuartWheater commented Sep 8, 2020

tombisho commented Sep 8, 2020

tombisho commented Nov 18, 2020

ds.summary simplification #257

ds.summary simplification #257

Comments

tombisho commented Sep 8, 2020

StuartWheater commented Sep 8, 2020

tombisho commented Sep 8, 2020

StuartWheater commented Sep 8, 2020

tombisho commented Sep 8, 2020

tombisho commented Nov 18, 2020