Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create self-hosted runner for integration(-ish) CI tests #75

Closed
wants to merge 31 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
2630e05
change recipe-test to test the runner
safoinme Aug 18, 2023
fc4b378
change recipe-test to test the runner
safoinme Aug 18, 2023
5a576e7
add clone repo step
safoinme Aug 18, 2023
60e48df
remove cloning step
safoinme Aug 18, 2023
1ea2aa8
initial code for creating a self-hosted runner on an azure vm to test…
safoinme Aug 18, 2023
2b4b2bc
add destroy
safoinme Aug 18, 2023
71919d5
fix repo url
safoinme Aug 18, 2023
329bb10
add terraform backend to store the state
safoinme Aug 18, 2023
28f6d64
change image
safoinme Aug 19, 2023
4c9efb8
return k3d-test to default runner
safoinme Aug 23, 2023
ad7b484
Merge branch 'develop' into feature/create-self-hosted-runner
safoinme Aug 23, 2023
862935d
Merge branch 'develop' into feature/create-self-hosted-runner
safoinme Aug 23, 2023
35114c2
Merge branch 'develop' into feature/create-self-hosted-runner
strickvl Aug 24, 2023
f3377e6
Update infrastructure/terraform.tf
safoinme Aug 24, 2023
6abcbb1
apply suggested reviews
safoinme Aug 24, 2023
2a6e8d0
Merge branch 'develop' into feature/create-self-hosted-runner
strickvl Aug 28, 2023
b0d112b
Merge branch 'develop' into feature/create-self-hosted-runner
strickvl Aug 30, 2023
923150d
Merge branch 'develop' into feature/create-self-hosted-runner
strickvl Aug 30, 2023
7cdb910
Apply suggestions from code review
safoinme Sep 4, 2023
9bbb119
Merge branch 'develop' into feature/create-self-hosted-runner
safoinme Sep 4, 2023
9ad516b
Merge branch 'develop' into feature/create-self-hosted-runner
safoinme Sep 24, 2023
3184c72
try new workflow to run on self-hosted runner
safoinme Sep 24, 2023
56cfc4d
Merge branch 'feature/create-self-hosted-runner' of github.com:zenml-…
safoinme Sep 24, 2023
e77ac3a
format
safoinme Sep 24, 2023
fc46012
fix destory yml
safoinme Sep 24, 2023
f88cf99
fix deploy yml
safoinme Sep 24, 2023
7d30049
add tags to resource groups
safoinme Sep 24, 2023
83734ba
update blob write and check
safoinme Sep 24, 2023
0e8d71d
update blob write and check
safoinme Sep 24, 2023
8c512b8
Merge branch 'develop' into feature/create-self-hosted-runner
safoinme Oct 5, 2023
7ca13a3
Merge branch 'develop' into feature/create-self-hosted-runner
strickvl Oct 24, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -197,9 +197,16 @@ jobs:
(cd $dir && terraform plan -input=false)
done

provion-self-hosted-runner:
name: provion-self-hosted-runner
uses: ./.github/workflows/deploy-self-hosted-runner.yml
secrets: inherit

k3d_test:
name: k3d_test
runs-on: ubuntu-latest
runs-on: self-hosted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I'm curious though, how would GitHub know where/how you have self-hosted it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The Azure storage is used as a lockdown mechanism to avoid the scenario where we have multiple runs at the same time in the VM and one finishes before the others, what the lock mechanism does is create a file with the run ID within the storage whenever a new run is called and before destroying the VM it first deletes the file that has the same run id and then does a check if there are any other files left, if none is left it allows the destroy if there is even one file left it means some run is still in progress and only last run would be allowed to destroy the VM
  • GH detects that because the VM is configured and connected to the GH action server and it has a Heartbeats test that checks that the self-hosted runner is still running and lunch the runs within that connected runner (the self-hosted is the default name to any runner so if there is multiple it will run on free one otherwise we can give specific names to each runner)

needs:
- provion-self-hosted-runner
steps:
- name: Checkout
uses: actions/checkout@v2
Expand All @@ -221,3 +228,9 @@ jobs:
(cd $dir && terraform validate)
(cd $dir && terraform plan -input=false)
done

destory-self-hosted-runner:
name: destory-self-hosted-runner
needs: k3d_test
if: always()
uses: ./.github/workflows/destroy-self-hosted-runner.yml
64 changes: 64 additions & 0 deletions .github/workflows/deploy-self-hosted-runner.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: Deploy the test runner vm to azure

on:
workflow_call:
workflow_dispatch:
push:
branches:
- main
- develop
paths-ignore: ["**.md"]

jobs:
deploy_test_vm:
name: Deploy VM to azure
runs-on: ubuntu-latest
env:
ARM_CLIENT_ID: ${{ secrets.ARM_CLIENT_ID }}
ARM_CLIENT_SECRET: ${{ secrets.ARM_CLIENT_SECRET }}
ARM_SUBSCRIPTION_ID: ${{ secrets.ARM_SUBSCRIPTION_ID }}
ARM_TENANT_ID: ${{ secrets.ARM_TENANT_ID }}
permissions:
contents: "read"
id-token: "write"

defaults:
run:
working-directory: ./infrastructure

steps:
- name: Checkout the Code
uses: actions/checkout@v3

- name: Install Azure CLI
run: |
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

- name: Login to Azure
run: |
az login --service-principal --username $ARM_CLIENT_ID --password $ARM_CLIENT_SECRET --tenant $ARM_TENANT_ID

- name: Setup Terraform
uses: hashicorp/setup-terraform@v2

- name: Terraform fmt
id: fmt
run: terraform fmt -check
continue-on-error: true

- name: Terraform Init
id: init
run: terraform init

- name: Terraform Validate
id: validate
run: terraform validate -no-color

- run: terraform apply -auto-approve
env:
TF_VAR_github_runner_token: ${{ secrets.runner_token }}

- name: Create blob
run: |
echo "Creating blob..."
az storage blob upload --account-name zenmlstorageaccount --container-name github-runner-tf --name github-run-${{ github.run_id }} --type block --data "${{ github.run_id }}"
58 changes: 58 additions & 0 deletions .github/workflows/destroy-self-hosted-runner.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
name: Destroy the test runner vm to azure

on:
workflow_call:
workflow_dispatch:

jobs:
destroy_test_vm:
name: Destroy VM to azure
runs-on: ubuntu-latest
env:
ARM_CLIENT_ID: ${{ secrets.ARM_CLIENT_ID }}
ARM_CLIENT_SECRET: ${{ secrets.ARM_CLIENT_SECRET }}
ARM_SUBSCRIPTION_ID: ${{ secrets.ARM_SUBSCRIPTION_ID }}
ARM_TENANT_ID: ${{ secrets.ARM_TENANT_ID }}
permissions:
contents: "read"
id-token: "write"

defaults:
run:
working-directory: ./infrastructure

steps:
- name: Checkout the Code
uses: actions/checkout@v3

- name: Setup Terraform
uses: hashicorp/setup-terraform@v2

- name: Terraform fmt
id: fmt
run: terraform fmt -check
continue-on-error: true

- name: Terraform Init
id: init
run: terraform init

- name: Terraform Validate
id: validate
run: terraform validate -no-color

- name: Delete blob
run: |
az storage blob delete --account-name zenmlstorageaccount --container-name github-runner-tf --name ${{ github.run_id }}

- name: Check if any blobs left
id: check_blobs
run: |
blobs=$(az storage blob list --account-name zenmlstorageaccount --container-name github-runner-tf --query "[?starts_with(name, 'github-run')].name" --output tsv)
echo "BLOBS=$blobs" >> $GITHUB_ENV

- name: Destroy VM
run: terraform destroy -auto-approve -refresh=False
env:
TF_VAR_github_runner_token: ${{ secrets.runner_token }}
if: env.BLOBS == ''
104 changes: 104 additions & 0 deletions infrastructure/deploy.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
resource "azurerm_resource_group" "example" {
name = "zenml-github-test"
location = "West Europe"

tags = {
z-env = "dev"
z-owner = "safoine-ext"
z-project = "testing"
z-team = "oss"
z-description = "resources for integration testing"
}
}

resource "azurerm_virtual_network" "example" {
name = "mlstack-test-network"
address_space = ["10.0.0.0/16"]
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
}

resource "azurerm_subnet" "example" {
name = "mlstack-subnet"
resource_group_name = azurerm_resource_group.example.name
virtual_network_name = azurerm_virtual_network.example.name
address_prefixes = ["10.0.2.0/24"]
}

resource "azurerm_network_interface" "example" {
name = "mlstack-nic"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name

ip_configuration {
name = "mlstack-ip"
subnet_id = azurerm_subnet.example.id
private_ip_address_allocation = "Dynamic"
public_ip_address_id = azurerm_public_ip.example.id
}
}

resource "azurerm_public_ip" "example" {
name = "mlstack-pip"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
allocation_method = "Dynamic"
}

resource "azurerm_network_security_group" "example" {
name = "mlstack-nsg"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
}

resource "azurerm_network_security_rule" "example" {
name = "SSH"
priority = 1001
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "22"
source_address_prefix = "*"
destination_address_prefix = "*"
resource_group_name = azurerm_resource_group.example.name
network_security_group_name = azurerm_network_security_group.example.name
}

resource "azurerm_network_interface_security_group_association" "example" {
network_interface_id = azurerm_network_interface.example.id
network_security_group_id = azurerm_network_security_group.example.id
}

data "azurerm_ssh_public_key" "example" {
name = "mlstack-test-vm"
resource_group_name = "zenml-developers"
}

data "azurerm_image" "example" {
name = "mlstack-github-runner-machine-image-20230819162059"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't the image change over time or can this stay hardcoded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can become a variable that for now has this as a default value, and it can be configurable for sure.
However for the initial versions, there will be a manual step since GitHub didn't provide an API for self-hosted runners, The issue is that we need to configure the runner agent within the VM with a one-time given token, that is why the configuration of the agent would still remain a manual step which we build a VM image on top of to use

resource_group_name = "zenml-developers"
}

resource "azurerm_linux_virtual_machine" "example" {
name = "mlstack-test-machine"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
size = "Standard_D8s_v3"
admin_username = "mlstackuser"
network_interface_ids = [
azurerm_network_interface.example.id,
]

admin_ssh_key {
username = "mlstackuser"
public_key = data.azurerm_ssh_public_key.example.public_key
}

os_disk {
caching = "ReadWrite"
storage_account_type = "StandardSSD_LRS"
}

source_image_id = data.azurerm_image.example.id
}
22 changes: 22 additions & 0 deletions infrastructure/terraform.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# defining the providers for the recipe module
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = ">=3.16.0"
}
}

required_version = ">= 0.14.8"

backend "azurerm" {
resource_group_name = "zenml-developers"
storage_account_name = "zenmlstorageaccount"
container_name = "github-runner-tf"
key = "terraform.tfstate"
}
}

provider "azurerm" {
features {}
}
safoinme marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 4 additions & 0 deletions infrastructure/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
variable "github_runner_token" {
description = "GitHub token"
type = string
}
Loading