Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync released data models to BQ tables #420

Open
adamjtaylor opened this issue Jun 7, 2024 · 4 comments
Open

Sync released data models to BQ tables #420

adamjtaylor opened this issue Jun 7, 2024 · 4 comments

Comments

@adamjtaylor
Copy link
Contributor

As a data manager for HTAN I would like our internal BigQuery `htan-dcc:metadata tables to include

  • data-model_main - a accurate reflection of the main branch of the data model
  • data-model_vYY.MM.minor: a table for every version of the data model
  • data-model_latest a table that reflects the latest released version of the data model

This allows us to ensure that we can use this in queries against our submitted manifests or other information held in BigQuery

We can extend the bq-schema workflow as follows

Add running when a release is created

on:
  push:
    branches: main
    paths: 'HTAN.model.csv'
  release:
    types: [created]
  workflow_dispatch: 

Add a job to create a versioned table if the event name is release

  add-versioned-table:
    name: Add versioned schema to BQ
    runs-on: ubuntu-latest
    needs: add-to-bq
    if: github.event_name == 'release'

Then duplicate the versioned table as latest

      - name: Duplicate versioned table as latest
        shell: bash
        run: |
          VERSION=${{ github.event.release.tag_name }}
          bq cp htan-dcc:metadata.data_model_${VERSION} htan-dcc:metadata.data_model_latest
@aclayton555
Copy link
Contributor

Please add a "critical" label if expected within phase 1.0. Or a "renewal" label if this can wait.

@aclayton555
Copy link
Contributor

Need to discuss with ISB during data flow discussions. @aclayton555 tag in flow diagram. Need to understand how users are engaging with BQ

@aclayton555 aclayton555 self-assigned this Aug 13, 2024
@aclayton555
Copy link
Contributor

Currently, there is workflow there is workflow to sync the staging version of the data model with the BQ tables. Used to help populate attribute description. This is still currently in use, but @PozhidayevaDarya has not run it for a little while. TBD on updating this to include the attributes listed here AND update this so that it is no longer pointing to staging.

@aclayton555
Copy link
Contributor

24-8 Close-out: take this into consideration in the data model design doc. Need to understand what is needed here and the needed architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants