Scripts used for Kern AI CI/CD efforts.
- GitHub: Admin Repositories Settings
- ACR: Delete Docker Images
- ACR: Docker Push
- ACR: Docker Push Release
- ACR: Docker Push Test
- Azure: Function App Deployment
- GitHub: Delete Branch
- GitHub: Release
- GitHub: Validate Release
- K8: Apply
- K8: Cluster Deploy
- K8: Destroy
- K8: Edit
- K8: Execution Environments
- K8: Reload Secrets
- K8: Release
- K8: Restart
- K8: Test
- Parent Images: Build
- Parent Images: Matrix
- Parent Images: Submodule Merge
- Parent Images: Parent Image Merge
- Parent Images: Release
- OpenTofu: Release
- OpenTofu: Generate Docs
- OpenTofu: Plan/Apply
Workflow file: admin_update_repo_settings.yml
Triggers:
- workflow_call
Description:
- updates IaC repository General Settings and Rulesets
Jobs:
-
GitHub: Update General Repository Settings
Update General Repository Settings
-
GitHub: Update tf-module Rulesets
Update tf-module Rulesets
-
GitHub: Update tf-iac Rulesets
Update tf-iac Rulesets
Workflow file: az_acr_delete.yml
Triggers:
- workflow_call
Description:
- deletes Container Images specified by the workflow input
Jobs:
-
Docker: Delete Test Tags
Configure branch name
Delete Container Image
-
Docker: Delete Branch Tags
Configure branch name
Delete Branch Container Image
Workflow file: az_acr_push.yml
Triggers:
- workflow_dispatch
- push
Description:
- before pushing the Docker image, the branch name is resolved to replace
/
with-
and the image is built with the resolved branch name - builds and deploys Docker images in multiple steps
Jobs:
- Docker: Build & Push
Configure branch name
Build & Push <application-repo>:${{ matrix.platform }}-<feature-hotfix>
Build & Push <application-repo>:${{ matrix.platform }}-gpu
Workflow file: az_acr_release.yml
Triggers:
- workflow_dispatch
- pull_request_closed
- release
Description:
- builds and deploys Docker images in multiple steps
Jobs:
- Docker: Build & Push
Build & Push <application-repo>:amd64
Build & Push <application-repo>:arm64
Build & Push <application-repo>:latest
Workflow file: az_acr_test.yml
Triggers:
- pull_request_opened_synchronized
Outputs:
- GH_REF_NAME
Description:
- before pushing the Docker image, the branch name is resolved to replace
/
with-
and the image is built with the resolved branch name - builds and deploys the test Docker Image used by the K8: Test workflow
Jobs:
- Docker: Build & Push (Test)
Configure branch name
Build & Push <application-repo>:test-<feature-hotfix>
Workflow file: az_fnapp_deploy.yml
Triggers:
- workflow_dispatch
- push
Description:
- builds and deploys the Azure Function App
- currently used to deploy the self hosted GitHub Actions Runner Monitor
Jobs:
- Azure: Build & Deploy Function App
Resolve Project Dependencies Using Pip
Run Azure Functions Action
Workflow file: gh_delete_branch.yml
Triggers:
- pull_request_closed
Description:
- calls ACR: Delete Docker Image job, targeting the tag
:test-<feature/hotfix>
- deletes the feature/hotfix branch Container Images (
:<platform>-<feature/hotfix>
) - deletes the feature/hotfix branch
Troubleshooting:
- this job will fail when the feature/hotfix branch is deleted manually
Jobs:
-
ACR: Delete Test Image
Configure branch name
Delete Container Image
-
ACR: Delete Branch Images
Configure branch name
Delete Branch Container Image
-
GitHub: Delete Branch
Delete Branch
Workflow file: gh_release.yml
Triggers:
- release
Inputs:
- deployment_status
Description:
- publishes a release on GitHub with the tag generated by the pre-release that triggered this workflow
- deletes a pre-release on GitHub with the tag generated by the pre-release that triggered this workflow
- runs in case of a release deployment failure
Troubleshooting:
- after fixing the error that caused the release deployment failure, recreate the pre-release to trigger the release deployment again
Jobs:
-
GitHub: Publish Release
Publish Release
-
GitHub: Delete Prerelease
Delete Prerelease
Workflow file: gh_validate_release.yml
Triggers:
- release
Description:
- validates the release tag generated by the pre-release that triggered this workflow, using a RegEx check for semantic versioning
Troubleshooting:
- inspect the pre-release tag name and ensure it follows the RegEx check for semantic versioning
Jobs:
- GitHub: Validate Release
Validate Release Tag
Workflow file: k8s_apply.yml
Triggers:
- pull_request_closed (dev)
- workflow_dispatch
Description:
- generates a Kubernetes kustomization diff and applies it to the cluster
- differs from the
k8s-deploy
job in that it applies the entire namespace, as opposed to application specific configurations
Jobs:
- K8: Apply Cluster Resources
Generate Kustomization
Apply Kustomization
Assert Deploy Success
Revert on failure
Workflow file: k8s_deploy.yml
Triggers:
- workflow_call
Inputs:
- environment
Outputs:
- deployment_status
Description:
- deploys the application to the Kubernetes cluster
- differs from the
k8s-apply
job in that it applied application specific configurations, as opposed to the entire namespace - uses Canary Deployment strategy
Jobs:
- K8: Deploy
Generate Kustomization
Generate Deployment
Assert Deployment Success
Promote Deployment
Reject Deployment
Workflow file: k8s_destroy.yml
Triggers:
- workflow_dispatch
Description:
- deletes all deployment and service Kubernetes resources in the namespace configured by GitHub Actions Environment Variables
Jobs:
- K8: Destroy Cluster Namespace
Destroy Cluster Namespace
Workflow file: k8s_edit.yml
Triggers:
- pull_request_closed
Description:
- updates the Kubernetes deployment image tags to the latest release
- creates a new branch
automated-release-dev
and a corresponding Pull Request ink8-cluster-cognition
repository - when a Pull Request already exists, deployment image tag updates are accumulated on the existing Pull Request
Jobs:
- K8: Edit Cluster Deployment
Perform Edit/Git Operations
Workflow file: k8s_exec_env_pull.yml
Triggers:
- workflow_dispatch
Description:
- pulls execution environment images inside the Kubernetes cluster
Jobs:
- K8: Docker Pulls
Execute docker pull
Workflow file: k8s_reload_secrets.yml
Triggers:
- workflow_dispatch
Inputs:
- deployment_name
Description:
- recreates a secret in the Kubernetes cluster with the latest value from Azure Key Vault, specified by the workflow input (deployment name)
- restarts a deployment in the Kubernetes cluster, specified by the workflow input (deployment name)
Jobs:
- K8: Reload Secrets
Run Secret Reload
Workflow file: k8s_release.yml
Triggers:
- pull_request_closed
- release
Description:
- calls GitHub: Validate Release job
- calls ACR: Docker Push Release job
- calls K8: Edit job
- calls GitHub: Release job
- forwards deployment status to the GitHub: Release job
- calls GitHub: Delete Branch job
Jobs:
-
call-gh-validate-release
-
call-az-acr-release
-
call-k8-edit
-
call-gh-release
-
GitHub: Delete Branch
Delete Branch
Workflow file: k8s_restart.yml
Triggers:
- workflow_dispatch
Inputs:
- deployment_name
Description:
- restarts a deployment in the Kubernetes cluster, specified by the workflow input
Jobs:
- K8: Restart Cluster Deployment
Restart Cluster Deployment
Workflow file: k8s_test.yml
Triggers:
- pull_request_opened_synchronized
Inputs:
- test_cmd
Description:
- calls ACR: Docker Push Test job
- runs
alemic upgrade
on the application that triggered this workflow - if an application that depends on
refinery-gateway
database changes (e.g.refinery-tokenizer
) triggers this workflow, thealembic upgrade
is run on therefinery-gateway
database if the same test Docker Image tag exists - uses the test Docker Image generated by the ACR: Docker Push Test job to run tests in the Kubernetes cluster
- uses the revision number generated in the first step to downgrade the database
Troubleshooting:
- in case of a failed test, inspect the logs of this job to identify the issue and resolve it by updating the application code
- in case this workflow corrupted app.dev.kern.ai, manually run K8: Apply in k8-cluster-cognition to apply the latest container images available on dev
- in case of a workflow failure (TBD), ignore the failure and proceed with Pull Request merge
Jobs:
-
call-az-acr-push-test
-
K8: Test Cluster Deployment
Test Cluster Deployment
Workflow file: pi_build.yml
Triggers:
- pull_request_opened_synchronized
Description:
- builds & pushes
refinery-parent-images:<branch>-<type>
to registry.dev.kern.ai
Jobs:
-
Configure Head Branch Name
Configure branch name
-
pi-matrix
-
Parent Images: Docker Build
Set up Python
Install Dependencies
Compile Requirements
Build & Push refinery-parent-images:${{ needs.configure-branch-name.outputs.gh_head_ref }}-${{ matrix.parent_image_type }}
Build & Push refinery-parent-images:${{ needs.configure-branch-name.outputs.gh_head_ref }}-${{ matrix.parent_image_type }}-arm64
Workflow file: pi_matrix.yml
Triggers:
- workflow_call
Inputs:
- repository
- checkout_ref
- parent_image_type
Outputs:
- parent_image_type
- include
Description:
- creates a Matrix Strategy input for GitHub Action with the following structure:
- { "parent_image_type": [ "mini", "next" ], "include": [ { "parent_image_type": "mini", "app": "refinery-authorizer" }, { "parent_image_type": "mini", "app": "refinery-gateway-proxy" }, { "parent_image_type": "next", "app": "admin-dashboard" }, { "parent_image_type": "next", "app": "refinery-ui" }, { "parent_image_type": "next", "app": "cognition-ui" } ] }
Jobs:
- Parent Images: Generate Matrix
Generate Matrix
Workflow file: pi_merge_submodule.yml
Triggers:
- pull_request_closed (dev)
Description:
- updates Parent Image repositories' submodule reference
Jobs:
-
Configure Head Branch Name
Configure branch name
-
pi-matrix
-
Parent Images: Submodule
Set up Python
Install Dependencies
Perform Edit/Git Operations
-
GitHub: Delete Branch
Delete Branch
Workflow file: pi_merge_parent_image.yml
Triggers:
- pull_request_closed (dev)
Description:
- builds & pushes
refinery-parent-images:dev-<type>
to registry.dev.kern.ai - updates Application repositories' -requirements.in and requirements.txt
Troubleshooting:
- package version resolution failure (ResolutionImpossible) (example)
- resolved by updating the package version in the Application repository's -requirements.in file
- worked around by manually performing the requirements compilation
Jobs:
-
Configure Head Branch Name
Configure branch name
-
pi-matrix
-
Parent Images: Docker Build
Set up Python
Install Dependencies
Compile Requirements
Build & Push refinery-parent-images:${{ github.event.pull_request.base.ref }}-${{ env.PARENT_IMAGE_TYPE }}
Build & Push refinery-parent-images:${{ github.event.pull_request.base.ref }}-${{ env.PARENT_IMAGE_TYPE }}-arm64
Build & Push refinery-parent-images:sha-${{ env.PARENT_IMAGE_TYPE }}
Build & Push refinery-parent-images:sha-${{ env.PARENT_IMAGE_TYPE }}-arm64
-
Parent Images: App
Set up Python
Install Dependencies
Clone ${{ matrix.app }}
Compile Requirements (Python)
Compile Requirements (Next)
Perform Edit/Git Operations (Python)
Perform Edit/Git Operations (Next)
-
GitHub: Delete Branch
Delete Branch
-
GitHub: Delete Branch
Delete Branch
Workflow file: pi_release.yml
Triggers:
- prerelease
Description:
- builds & pushes
refinery-parent-images:vX.X.X-<type>
to Docker Hub - updates Application repositories' Dockerfiles to use the new parent image (updates Application repositories' open PRs)
Jobs:
-
pi-matrix
-
Parent Images: Dockerfile
Perform Edit/Git Operations
Workflow file: release_please.yml
Triggers:
- workflow_call
Description:
- generates a release Pull Request with CHANGELOG updates for the calling repository
- requires Conventional Commits
Jobs:
- tf-module-release
googleapis/release-please-action@v4
Workflow file: tf_docs.yml
Triggers:
- push
Description:
- generates documentation for the OpenTofu module
Jobs:
- tf-module-docs
actions/checkout@v4
Render OpenTofu docs and push changes back to PR
Workflow file: tf_plan_apply.yml
Triggers:
- workflow_dispatch
- push
Outputs:
- tf_plan_exit_code
- tf_destroy
Description:
- executes
tofu plan
on the repository that triggered this workflow - creates a destruction plan when the calling repository's GitHub Actions Environment Variable
TF_DESTROY
is set to-destroy
- executes
tofu apply
on the repository that triggered this workflow, assuming that thetofu plan
job has succeeded
Troubleshooting:
- inspect the logs of the
tofu plan
job to identify the issue and resolve it by updating Infrastructure as Code (IaC) files - inspect the logs of the
tofu plan
job to identify the issue and resolve it by updating Infrastructure as Code (IaC) files
Jobs:
-
OpenTofu Plan
OpenTofu Plan
-
OpenTofu Apply
OpenTofu Apply