List datasets + files from core + staging sources #700

Sets out the main design ideas behind the collection of resources by the server and making these available via a new API. See commentary added in code for more details.

API design is WIP - see commentary added in this commit There is subtlety in how we want prefix-based filtering to work, as the _name_ of a set of datasets is not always the same as the filename, as we have the ability to group multiple filenames (keys) together. E.g. given the following collection of datasets: - name: X_Y - versions: [ - name: X_Y (current version) - name: X_Y (old version) - name: X_Y_2022-01-01 (current version, datestamped filename) ] Then prefix-based filtering on "X/Y/2022-01-01" could either return nothing as it doesn't match the name assigned to the collection, or return the dataset collection restricted to 1/3 of the versions. Here I implement the former.

Code is taken from my existing prototype of the cards UI at https://github.com/nextstrain/workflow-asset-summary/tree/6730c2544d31b63bbe274c96b31e524176e03ca2 with widespread modifications to adapt it to this repo's design including: - typescript -> javascript (a big downgrade) - CSS changes - API response schema is slightly different - disables the "drawer" drop-down for cards as that functionality is not possible using the approach from the prototype (due to d3) - removal of version "rugplot", which relied on d3 controlling the DOM - rewriting the filtering algorithm - the card hierarchy is created client-side rather than server side

The spinner design is copied from multiple other Nextstrain projects. (I was surprised this wasn't already part of nextstrain.org)

Reflects the new cards UI as the main interface to the datasets. The previous datasets+narratives table is restricted to narratives-only, as these are not yet (or may never be) part of the new cards UI. The colourful tiles are restricted to 6 pathogens, which on most screen resolutions render as 3 columns x 2 rows. The removed pathogens are not being regularly updated, and the ultimate aim is for datasets to be highlighted through the new UI.

The new URL ( "/pathogens/inputs") is not advertised elsewhere as this page will probably see modifications in the short term. We need to decide what kind of files we want listed and how, if at all, we should link these to their associated dataset.

The previous dev-only approach is preserved under a `LOCAL_INVENTORY` env flag.

The staging page was previously broken due to an underlying API failure (?) so here we essentially replace the entire page. The spark-line is slightly misleading for sources where we don't store past versions of datasets (in this case, the underlying S3 bucket isn't versioned). Nonetheless, some datasets are grouped together by our name munging rules. I didn't implement a files (inputs) page for staging as we essentially have no non-dataset files in the bucket!

This uses a simple interval-based approach, which should be enough for the core & staging sources. Ideally we would re-fetch the inventory shortly after it's creation. The time of day at which a new inventory appears seems random, so this would require using events or polling the inventory directory more frequently and only updating when a new manifest appears. The directory name of inventory manifests seems to be consistently YYYY-MM-DD + "T01-00Z", so perhaps it's enough to make HEAD requests for the upcoming day's data? 2023-07-12T01-00Z/

It's been a long time since I've been burned by MacOS' filesystem being case-insensitive, but it happens!

… with changes made by James on the AWS Console. I copied the policy¹ as JSON from AWS Console and pasted directly into the file. After reordering to match the existing file contents, which are sorted to be more readable rather than alphabetical, I confirmed that this updated version of the file results in no changes with `terraform plan`. ¹ arn:aws:iam::827581582529:policy/NextstrainDotOrgServerInstance-testing Co-authored-by: James Hadfield <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List datasets + files from core + staging sources #700

List datasets + files from core + staging sources #700

Commits on Jul 6, 2023

Commits on Jul 10, 2023

Commits on Jul 13, 2023

Commits on Oct 4, 2023