Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List datasets + files from core + staging sources #700

Closed
wants to merge 13 commits into from

Commits on Jul 6, 2023

  1. Configuration menu
    Copy the full SHA
    a3d863f View commit details
    Browse the repository at this point in the history

Commits on Jul 10, 2023

  1. API to list all available resources

    Sets out the main design ideas behind the collection of
    resources by the server and making these available via a new API.
    See commentary added in code for more details.
    jameshadfield committed Jul 10, 2023
    Configuration menu
    Copy the full SHA
    94c100e View commit details
    Browse the repository at this point in the history
  2. Implement filtering for getAvailable API

    API design is WIP - see commentary added in this commit
    
    There is subtlety in how we want prefix-based filtering to work, as the
    _name_ of a set of datasets is not always the same as the filename,
    as we have the ability to group multiple filenames (keys) together.
    E.g. given the following collection of datasets:
      - name: X_Y
      - versions: [
        - name: X_Y (current version)
        - name: X_Y (old version)
        - name: X_Y_2022-01-01 (current version, datestamped filename)
      ]
    Then prefix-based filtering on "X/Y/2022-01-01" could either
    return nothing as it doesn't match the name assigned to the collection,
    or return the dataset collection restricted to 1/3 of the versions.
    Here I implement the former.
    jameshadfield committed Jul 10, 2023
    Configuration menu
    Copy the full SHA
    07c0796 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d48d0e6 View commit details
    Browse the repository at this point in the history
  4. Add (prototype) cards UI for resource collection

    Code is taken from my existing prototype of the cards UI at
    https://github.com/nextstrain/workflow-asset-summary/tree/6730c2544d31b63bbe274c96b31e524176e03ca2
    with widespread modifications to adapt it to this repo's
    design including:
     - typescript -> javascript (a big downgrade)
     - CSS changes
     - API response schema is slightly different
     - disables the "drawer" drop-down for cards as that functionality is not
    possible using the approach from the prototype (due to d3)
     - removal of version "rugplot", which relied on d3 controlling the DOM
     - rewriting the filtering algorithm
     - the card hierarchy is created client-side rather than server side
    jameshadfield committed Jul 10, 2023
    Configuration menu
    Copy the full SHA
    c547a0c View commit details
    Browse the repository at this point in the history
  5. Add loading spinner

    The spinner design is copied from multiple other Nextstrain projects.
    (I was surprised this wasn't already part of nextstrain.org)
    jameshadfield committed Jul 10, 2023
    Configuration menu
    Copy the full SHA
    02fe52f View commit details
    Browse the repository at this point in the history

Commits on Jul 13, 2023

  1. Update /pathogens page

    Reflects the new cards UI as the main interface to the datasets. The
    previous datasets+narratives table is restricted to narratives-only,
    as these are not yet (or may never be) part of the new cards UI.
    
    The colourful tiles are restricted to 6 pathogens, which on most screen
    resolutions render as 3 columns x 2 rows. The removed pathogens are
    not being regularly updated, and the ultimate aim is for datasets to be
    highlighted through the new UI.
    jameshadfield committed Jul 13, 2023
    Configuration menu
    Copy the full SHA
    eed3a2d View commit details
    Browse the repository at this point in the history
  2. Add remote-inputs page listing core files

    The new URL ( "/pathogens/inputs") is not advertised elsewhere as this
    page will probably see modifications in the short term. We need to
    decide what kind of files we want listed and how, if at all, we should
    link these to their associated dataset.
    jameshadfield committed Jul 13, 2023
    Configuration menu
    Copy the full SHA
    f13a7cd View commit details
    Browse the repository at this point in the history
  3. Fetch latest inventory from S3

    The previous dev-only approach is preserved under a `LOCAL_INVENTORY`
    env flag.
    jameshadfield committed Jul 13, 2023
    Configuration menu
    Copy the full SHA
    99c2c83 View commit details
    Browse the repository at this point in the history
  4. Add staging source inventory

    The staging page was previously broken due to an underlying API
    failure (?) so here we essentially replace the entire page.
    
    The spark-line is slightly misleading for sources where we don't store
    past versions of datasets (in this case, the underlying S3 bucket isn't
    versioned). Nonetheless, some datasets are grouped together by our
    name munging rules.
    
    I didn't implement a files (inputs) page for staging as we essentially
    have no non-dataset files in the bucket!
    jameshadfield committed Jul 13, 2023
    Configuration menu
    Copy the full SHA
    8cb84fe View commit details
    Browse the repository at this point in the history
  5. Recollect resources every ~24 hours

    This uses a simple interval-based approach, which should be enough for
    the core & staging sources.
    
    Ideally we would re-fetch the inventory shortly after it's creation.
    The time of day at which a new inventory appears seems random, so this
    would require using events or polling the inventory directory more
    frequently and only updating when a new manifest appears.
    The directory name of inventory manifests seems to be consistently
    YYYY-MM-DD + "T01-00Z", so perhaps it's enough to make HEAD requests
    for the upcoming day's data?
    
    2023-07-12T01-00Z/
    jameshadfield committed Jul 13, 2023
    Configuration menu
    Copy the full SHA
    8940561 View commit details
    Browse the repository at this point in the history
  6. [fixup] case sensitive filename

    It's been a long time since I've been burned by MacOS' filesystem
    being case-insensitive, but it happens!
    jameshadfield committed Jul 13, 2023
    Configuration menu
    Copy the full SHA
    cac3780 View commit details
    Browse the repository at this point in the history

Commits on Oct 4, 2023

  1. Update dev server IAM policy…

    … with changes made by James on the AWS Console.
    
    I copied the policy¹ as JSON from AWS Console and pasted directly into
    the file. After reordering to match the existing file contents, which
    are sorted to be more readable rather than alphabetical, I confirmed
    that this updated version of the file results in no changes with
    `terraform plan`.
    
    ¹ arn:aws:iam::827581582529:policy/NextstrainDotOrgServerInstance-testing
    
    Co-authored-by: James Hadfield <[email protected]>
    victorlin and jameshadfield committed Oct 4, 2023
    Configuration menu
    Copy the full SHA
    a00b8e1 View commit details
    Browse the repository at this point in the history