Serverless Data Portal prototype for CMB-S4

Proof-of-concept for a data portal using static pages.

NERSC Globus endpoints

The data portal supports having datasets stored on different Globus Endpoints, as long as they support HTTPS access (so that people can download data directly). The easiest way is to host data at NERSC and activate a Guest Collection. At the moment we have them activated on the cmb and the cmbs4 projects, therefore any data under:

/global/cfs/projectdirs/cmbs4/gsharing/
/global/cfs/projectdirs/cmb/gsharing/

The collections on Globus:

How to prepare a release

Put data under `gsharing/datareleases`

Create hard links to the gsharing/datareleases/ folder, for example:

cp -al /global/cfs/cdirs/cmbs4/dc/dc0 /global/cfs/cdirs/cmbs4/gsharing/datareleases/

it is the owner of the files that needs to create the hard-links.

Alternatively move the data under gsharing and then put a symlink at the original location that points to the new location of the folder. The advantage of symlinks is that if we add a new file in a folder, that file is immediately available through Globus (it needs to be explicitely linked from the portal to make it available in the HTML interface). With hard links instead we need to create a new hard link for that file in the gsharing folder.

Create manifests

manifest.json files need to be created inside each dataset (i.e. folder which contains data, instead of only containing other folders) The file contains hash, size and url.

The URL for NERSC should be:

https://g-9fdb0b.6b7bd8.0ec8.data.globus.org/datareleases/xxx

there xxx is the name of the release.

makemanifest.py is the Python script that executed at the root of a data release transverse the hierarchy of folders and creates a manifest.json file in each folder directly containing data. This script is the only step in the process that needs to be executed where the data files are available, with writing access. So the easiest way at the moment is to login at NERSC with the cmbs4 Collaboration account and execute the script.

Create file metadata JSON files

The next step can be executed from any machine, no data access is necessary, we run a bash script that uses the globus-cli to gather information about all the files to be registered with the data portal, most notably the size. As output we have 1 JSON file for each data file. At the moment we have a simple bash script which builds all the filenames with nested loops and then calls globus ls. Unfortunately we need to customize the script for each data release, see as an example get-dc0-file-lists.sh.

Generate the pages

The last step is a Python script that loads information about the files from the JSON files and writes all the markdown files, one page for each dataset. This script is highly customized for each dataset, see for example builddc0.py. We also need to create a homepage for the release, mostly with documentation about it, see for example dc0.md. It also generates the sidebar, which we need to paste into _data/sidebars/home_sidebar.yml and the dataset table which we need to paste at the botttom of the homepage for the release.

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
_data/sidebars		_data/sidebars
_includes		_includes
_layouts		_layouts
_tooltips		_tooltips
css		css
fonts		fonts
images		images
js		js
.gitignore		.gitignore
CNAME		CNAME
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
about.md		about.md
builddc0.py		builddc0.py
buildpanexv1.py		buildpanexv1.py
buildpanexv1_litebird.py		buildpanexv1_litebird.py
buildpanexv1_so.py		buildpanexv1_so.py
buildpanexv1_spt.py		buildpanexv1_spt.py
buildpr4.py		buildpr4.py
datasetpages.md		datasetpages.md
dc0-chlat-split01-025.md		dc0-chlat-split01-025.md
dc0-chlat-split01-040.md		dc0-chlat-split01-040.md
dc0-chlat-split01-090.md		dc0-chlat-split01-090.md
dc0-chlat-split01-150.md		dc0-chlat-split01-150.md
dc0-chlat-split01-230.md		dc0-chlat-split01-230.md
dc0-chlat-split01-280.md		dc0-chlat-split01-280.md
dc0-chlat-split02-025.md		dc0-chlat-split02-025.md
dc0-chlat-split02-040.md		dc0-chlat-split02-040.md
dc0-chlat-split02-090.md		dc0-chlat-split02-090.md
dc0-chlat-split02-150.md		dc0-chlat-split02-150.md
dc0-chlat-split02-230.md		dc0-chlat-split02-230.md
dc0-chlat-split02-280.md		dc0-chlat-split02-280.md
dc0-chlat-split04-025.md		dc0-chlat-split04-025.md
dc0-chlat-split04-040.md		dc0-chlat-split04-040.md
dc0-chlat-split04-090.md		dc0-chlat-split04-090.md
dc0-chlat-split04-150.md		dc0-chlat-split04-150.md
dc0-chlat-split04-230.md		dc0-chlat-split04-230.md
dc0-chlat-split04-280.md		dc0-chlat-split04-280.md
dc0-chlat-split08-025.md		dc0-chlat-split08-025.md
dc0-chlat-split08-040.md		dc0-chlat-split08-040.md
dc0-chlat-split08-090.md		dc0-chlat-split08-090.md
dc0-chlat-split08-150.md		dc0-chlat-split08-150.md
dc0-chlat-split08-230.md		dc0-chlat-split08-230.md
dc0-chlat-split08-280.md		dc0-chlat-split08-280.md
dc0-chlat-split16-025.md		dc0-chlat-split16-025.md
dc0-chlat-split16-040.md		dc0-chlat-split16-040.md
dc0-chlat-split16-090.md		dc0-chlat-split16-090.md
dc0-chlat-split16-150.md		dc0-chlat-split16-150.md
dc0-chlat-split16-230.md		dc0-chlat-split16-230.md
dc0-chlat-split16-280.md		dc0-chlat-split16-280.md
dc0-chlat-split32-025.md		dc0-chlat-split32-025.md
dc0-chlat-split32-040.md		dc0-chlat-split32-040.md
dc0-chlat-split32-090.md		dc0-chlat-split32-090.md
dc0-chlat-split32-150.md		dc0-chlat-split32-150.md
dc0-chlat-split32-230.md		dc0-chlat-split32-230.md
dc0-chlat-split32-280.md		dc0-chlat-split32-280.md
dc0-dset-table.md		dc0-dset-table.md
dc0-sidebar.yml		dc0-sidebar.yml
dc0-splat-split01-020.md		dc0-splat-split01-020.md
dc0-splat-split01-025.md		dc0-splat-split01-025.md
dc0-splat-split01-040.md		dc0-splat-split01-040.md
dc0-splat-split01-090.md		dc0-splat-split01-090.md
dc0-splat-split01-150.md		dc0-splat-split01-150.md
dc0-splat-split01-230.md		dc0-splat-split01-230.md
dc0-splat-split01-280.md		dc0-splat-split01-280.md
dc0-splat-split02-020.md		dc0-splat-split02-020.md
dc0-splat-split02-025.md		dc0-splat-split02-025.md
dc0-splat-split02-040.md		dc0-splat-split02-040.md
dc0-splat-split02-090.md		dc0-splat-split02-090.md
dc0-splat-split02-150.md		dc0-splat-split02-150.md
dc0-splat-split02-230.md		dc0-splat-split02-230.md
dc0-splat-split02-280.md		dc0-splat-split02-280.md
dc0-splat-split04-020.md		dc0-splat-split04-020.md
dc0-splat-split04-025.md		dc0-splat-split04-025.md
dc0-splat-split04-040.md		dc0-splat-split04-040.md
dc0-splat-split04-090.md		dc0-splat-split04-090.md
dc0-splat-split04-150.md		dc0-splat-split04-150.md
dc0-splat-split04-230.md		dc0-splat-split04-230.md
dc0-splat-split04-280.md		dc0-splat-split04-280.md
dc0-splat-split08-020.md		dc0-splat-split08-020.md
dc0-splat-split08-025.md		dc0-splat-split08-025.md
dc0-splat-split08-040.md		dc0-splat-split08-040.md
dc0-splat-split08-090.md		dc0-splat-split08-090.md
dc0-splat-split08-150.md		dc0-splat-split08-150.md
dc0-splat-split08-230.md		dc0-splat-split08-230.md
dc0-splat-split08-280.md		dc0-splat-split08-280.md
dc0-splat-split16-020.md		dc0-splat-split16-020.md
dc0-splat-split16-025.md		dc0-splat-split16-025.md
dc0-splat-split16-040.md		dc0-splat-split16-040.md
dc0-splat-split16-090.md		dc0-splat-split16-090.md
dc0-splat-split16-150.md		dc0-splat-split16-150.md
dc0-splat-split16-230.md		dc0-splat-split16-230.md
dc0-splat-split16-280.md		dc0-splat-split16-280.md
dc0-splat-split32-020.md		dc0-splat-split32-020.md
dc0-splat-split32-025.md		dc0-splat-split32-025.md
dc0-splat-split32-040.md		dc0-splat-split32-040.md
dc0-splat-split32-090.md		dc0-splat-split32-090.md
dc0-splat-split32-150.md		dc0-splat-split32-150.md
dc0-splat-split32-230.md		dc0-splat-split32-230.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Serverless Data Portal prototype for CMB-S4

NERSC Globus endpoints

How to prepare a release

Put data under `gsharing/datareleases`

Create manifests

Create file metadata JSON files

Generate the pages

About

Releases

Contributors 4

Languages

License

CMB-S4/serverless-data-portal-cmb-s4

Folders and files

Latest commit

History

Repository files navigation

Serverless Data Portal prototype for CMB-S4

NERSC Globus endpoints

How to prepare a release

Put data under gsharing/datareleases

Create manifests

Create file metadata JSON files

Generate the pages

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Contributors 4

Languages

Put data under `gsharing/datareleases`