Skip to content

How to deploy and restart runners

LaiRuiqi edited this page Feb 26, 2024 · 10 revisions

Deploy Runners

Runners physical node configuration: Four nodes with 4C-8G, 100GB storage. Suggested system image: ubuntu-20.04-2nic

How to deploy the four nodes:

  1. Build the runner deployer
cd vHive/scripts/github_runner
go build .
  1. Modify the conf.json

Need to modify conf.json, the format is as following:

{
  "ghOrg": "<GitHub account>",
  "ghPat": "<GitHub PAT>",
  "hostUsername": "<username>",
  "runners": {
    "<hostname-1>": {
      "type": "cri",
      "sandbox": "firecracker"
    },
    "<hostname-2>": {
      "type": "cri",
      "sandbox": "gvisor",
    },
    "<hostname-3>": {
      "type": "integ",
      "num": 2,
      "restart": false
    },
    "<hostname-4>": {
        "type": "profile"
    }
  }
}

Note that in conf.json, for ghOrg, it's vhive-serverless, for ghPat, it should be your own account's Personal Access Token, as long as your account has the correct permissions for vhive-serverless org

<username>:<hostname-1/2/3/4> is the ssh username and hostname, so if you use SCSE cloud nodes as runners, <hostname-1/2/3/4> should be their ip addresses.

After modifying this, deploy the runners remotely by running:

./deploy_runners --loglvl=debug

If it gives out error like “dial unix: missing address”, use:

eval `ssh-agent`
ssh-add ~/.ssh/<private_key>

Here <private_key> should be the key that has the ssh permission to all four runners, typically it's id_rsa

It is normal that this script doesn't success in one pass, simply re-run the deployment script after a while.

Restart Runners

On SCSE cloud, rebuild the three nodes and redeploy them.

When Should Restart Runners

For firecracker and gvisor cri tests, when the test stuck in helloworld is waiting for a Revision to be ready bc67c34ef2308282b8285077534667f

This basically implies that the firecracker and gvisor cri runners need to be restart(You can also restart only one runner in that case) But if the firecracker and gvisor cri test passed the Setup vHive CRI test environment step and failed in Run vHive CRI tests step, this typically is just sporadic failure and can be resolved by re-running the tests, just trigger the re-run button on github webpage is okay.