-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let's investigate the SPARC ecosystem #4
Comments
cc @Olivier-tl |
Noting this puppy as well: https://github.com/Pennsieve/pennsieve-agent-python/tree/main |
We can access the data files from any published datasets through the "Discover Service" of the Pennsieve API. The only limitation is that each requested file must be below 5GB. No authentication required. It's simple:
🕵🏻♂️To investigate further: It looks like the pennsieve platform can automatically process EDF files (see processor-EDF) and has an API for timeseries data. However, an api key is required to access that API. ❌ pennsieve-python does not support the "Discover Service" of the Pennsieve API. --- Outdated comment below --- How can we leverage the SPARC ecosystem for the REVEAL data client?TLDR; Data files can be requested through the Pennsieve API. Users need an api key which they can generate with a free account. The pennsieve-python package facilitates interfacing with the Pennsieve API. ❓Open question: Although osparc uses the pennsieve api through a user generated api key to expose SPARC datasets, its user interface only allows to connect one file at a time to a service node. This is unpractical, as the REVEAL dataset will have thousands of files and will be regularly updated. Can osparc provide the pennsieve api key to our REVEAL App service as an environment variable?
pennsieve.io vs sparc.sciencePublished datasets ends up on sparc.science where they are openly available to the public. On the other end, pennsieve.io requires an account and only contain a third of the published datasets (65/220). Sparc.science offers free direct download for datasets 5gb or smaller, otherwise they need to be downloaded through AWS S3 and the requester pays. Having researchers that wants to work with the REVEAL dataset create an AWS account and pay for data download is not ideal. Pennsieve offers presigned S3 urls without limit on the dataset size through its API. Only thing needed is a free Pennsieve account from which an API key can be generated. Pennsieve-API
SPARC Repositoriessparc-curationsparc-curation uses the pennsieve-python package (outdated, now pennsieve-agent-python?) to connect to the Pennsieve API. pennsieve-python
|
The Pennsieve API looks pretty good! If I've understood correctly, it enables listing and downloading files from any dataset that is public without any auth (aka will work in a random oSPARC container). This already enables a bunch of functionality for a typical app:
|
Exactly! |
If we find ourselves dying to visualise timeseries efficiently, then we can also chat with Joost about indexing our timeseries into their backend and exposing API keys in the oSPARC containers for access. |
There may be some helpful tools in the SPARC ecosystem. Let's spend 1 day taking a look and checking whether they simplify anything.
nih-sparc/sparc.client#22 (comment)
The text was updated successfully, but these errors were encountered: