Let's investigate the SPARC ecosystem #4

elvijs · 2024-03-18T10:35:49Z

There may be some helpful tools in the SPARC ecosystem. Let's spend 1 day taking a look and checking whether they simplify anything.

nih-sparc/sparc.client#22 (comment)

elvijs · 2024-03-18T10:36:18Z

cc @Olivier-tl

elvijs · 2024-03-18T10:41:29Z

Noting this puppy as well: https://github.com/Pennsieve/pennsieve-agent-python/tree/main

Olivier-tl · 2024-04-23T15:10:28Z

We can access the data files from any published datasets through the "Discover Service" of the Pennsieve API. The only limitation is that each requested file must be below 5GB. No authentication required. It's simple:

Discover all files and directories in a dataset using this endpoint.
Get download link to a file using this endpoint

🕵🏻‍♂️To investigate further: It looks like the pennsieve platform can automatically process EDF files (see processor-EDF) and has an API for timeseries data. However, an api key is required to access that API.

❌ pennsieve-python does not support the "Discover Service" of the Pennsieve API.

--- Outdated comment below ---

How can we leverage the SPARC ecosystem for the REVEAL data client?

TLDR; Data files can be requested through the Pennsieve API. Users need an api key which they can generate with a free account. The pennsieve-python package facilitates interfacing with the Pennsieve API.

❓Open question: Although osparc uses the pennsieve api through a user generated api key to expose SPARC datasets, its user interface only allows to connect one file at a time to a service node. This is unpractical, as the REVEAL dataset will have thousands of files and will be regularly updated. Can osparc provide the pennsieve api key to our REVEAL App service as an environment variable?

⚠️ EDIT: As a test, I created a new Pennsieve account. Unless I am (manually) added to an organization, I don't have access to any datasets.

⚠️ Edit 2: There is a 15gb limit for data download even when logged in on Pennsieve

pennsieve.io vs sparc.science

Published datasets ends up on sparc.science where they are openly available to the public. On the other end, pennsieve.io requires an account and only contain a third of the published datasets (65/220). Sparc.science offers free direct download for datasets 5gb or smaller, otherwise they need to be downloaded through AWS S3 and the requester pays. Having researchers that wants to work with the REVEAL dataset create an AWS account and pay for data download is not ideal. Pennsieve offers presigned S3 urls without limit on the dataset size through its API. Only thing needed is a free Pennsieve account from which an API key can be generated.

Pennsieve-API

Pennsieve-API reference
To download a file, you need to:

SPARC Repositories

sparc-curation

sparc-curation uses the pennsieve-python package (outdated, now pennsieve-agent-python?) to connect to the Pennsieve API.

pennsieve-python

It looks like pennsieve-python can provide a pre-signed url for files (see here).
TimeSeriesAPI with support for annotations?! (pennsieve.api.timeseries)

elvijs · 2024-04-26T15:57:58Z

The Pennsieve API looks pretty good!

If I've understood correctly, it enables listing and downloading files from any dataset that is public without any auth (aka will work in a random oSPARC container). This already enables a bunch of functionality for a typical app:

list all subjects (get all dataset files and do some regex magic to pull the subject IDs or ask Justin to provide a summary file)
visualise surfaces (ask Justin to add a response feature csv, we just load it and plot)
visualise raw data (via the file download API; downside: likely to be slow)

Olivier-tl · 2024-04-26T15:59:57Z

Exactly!

elvijs · 2024-04-26T16:01:11Z

If we find ourselves dying to visualise timeseries efficiently, then we can also chat with Joost about indexing our timeseries into their backend and exposing API keys in the oSPARC containers for access.

elvijs added the documentation Improvements or additions to documentation label Mar 18, 2024

Olivier-tl closed this as completed Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let's investigate the SPARC ecosystem #4

Let's investigate the SPARC ecosystem #4

elvijs commented Mar 18, 2024

elvijs commented Mar 18, 2024

elvijs commented Mar 18, 2024

Olivier-tl commented Apr 23, 2024 •

edited

Loading

elvijs commented Apr 26, 2024

Olivier-tl commented Apr 26, 2024

elvijs commented Apr 26, 2024

Let's investigate the SPARC ecosystem #4

Let's investigate the SPARC ecosystem #4

Comments

elvijs commented Mar 18, 2024

elvijs commented Mar 18, 2024

elvijs commented Mar 18, 2024

Olivier-tl commented Apr 23, 2024 • edited Loading

How can we leverage the SPARC ecosystem for the REVEAL data client?

pennsieve.io vs sparc.science

Pennsieve-API

SPARC Repositories

sparc-curation

pennsieve-python

elvijs commented Apr 26, 2024

Olivier-tl commented Apr 26, 2024

elvijs commented Apr 26, 2024

Olivier-tl commented Apr 23, 2024 •

edited

Loading