Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NRES & bionetwork backfill #1301

Open
arschat opened this issue Sep 20, 2024 · 0 comments
Open

NRES & bionetwork backfill #1301

arschat opened this issue Sep 20, 2024 · 0 comments
Labels
HCA metadata backfill operations This issue is an operational task Submissions Submission WS tasks

Comments

@arschat
Copy link
Collaborator

arschat commented Sep 20, 2024

There are two bulk metadata updates on the project level, that we'd like to do.

Reasoning

  1. NRES addition in all open access datasets
    After the introduction of managed access datasets in the portal, we would like to add the data_use_restriction field in the metadata of all open access projects i.e. all projects of the portal that this update was not done in the previous bulk update in Data Portal tracker - Data Repository tasks #1270. This would require bumping the project schema version to version 19.0.0 and add the field "data_use_restriction": "NRES" in the project metadata.
  2. Bionetwork backfilling
    Dave asked us to add the bionetwork information in the schema, since portal started showing the biological network on the front page by default. There are a couple of open questions here.
    a. what is the true list for bionetworks? Is it tracker?
    b. what is the true list for atlas names? In tracker some atlas names are initials (i.e. MSK 1.0, or ORCF 1.0). Do we want to add these names?
    c. Projects in portal with no bionetwork: would we like to show None instead of unspecified?

Plan

Since both metadata exist in the project level, we would like to update using @idazucchi 's script which exports only project metadata (don't have to update the state to graph valid, just return to exported). The steps would be:

  1. Select projects (uuids) that need update for NRES
  2. Select projects (uuids) that need bionetwork update & appropriate bionetwork(s)
  3. Select projects (uuids) that need atlas name & version update & appropriate atlas name(s) & version(s)
  4. Write script that via api calls to ingest, will update these informations
  5. Export project metadata via Ida's script
  6. Bulk import form sent to Travis

1,2,3 tasks can be done via the Task tracker spreadsheet
4 script is almost ready for previous bulk update in #1270 (see comments for script) a few modifications might be needed
5 if we provide uuids to script it runs quickly
6 we can also extract project title in order to populate the import form easily
Estimated time needed ~2 days

Risks

  1. information on tracker is not up to date
    • we will update project or re-run this script for bulk updates in a next release
  2. old project gets error in import validation
    • drop project from current release & investigate how we can re-export to avoid errors
    • ask from import team to re-populate staging area with reverse-import script & try again
@arschat arschat added metadata backfill operations This issue is an operational task Submissions Submission WS tasks labels Sep 20, 2024
@idazucchi idazucchi added the HCA label Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HCA metadata backfill operations This issue is an operational task Submissions Submission WS tasks
Projects
None yet
Development

No branches or pull requests

2 participants