Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running NuttX CI with Self-Hosted Runners #1

Draft
wants to merge 23 commits into
base: upstream-runner
Choose a base branch
from
Draft

Conversation

lupyuen
Copy link

@lupyuen lupyuen commented Aug 28, 2024

Running NuttX CI with Self-Hosted Runners

Read the article: https://github.com/lupyuen/lupyuen.github.io/blob/master/src/ci.md

Let's test NuttX CI with Self-Hosted Runners on macOS Arm64 and Ubuntu x64:

  • I have a super powerful Mac Mini Arm64 that's under utilised
  • And an old MacBook Pro on Ubuntu x64 (24.04 LTS)
  • Plenty of bandwidth at home: Fibre To The Home with Downlink 650 Mbps, Uplink 560 Mbps
  • Follow these instructions to install Self-Hosted Runners for macOS Arm64 and Linux x64
  • Start a few instances of each runner. Each instance needs its own actions-runner folder. TODO: How to handle /github?
  • See below for the fixes for macOS Arm64 and Linux x64
  • Security Concerns: How to be sure that Self-Hosted Runners will run only approved scripts and commands?
    (Right now I have disabled external users from triggering GitHub Actions on my repo)

We modified the GitHub Workflow Files, to use Self-Hosted Runners:

Why are we doing this?

  • In case we need to reduce GitHub Hosting Costs. Or if we need to run the NuttX CI privately.
  • It's a great way to understand the Internals of NuttX CI!
  • Why is NuttX CI so heavy? That's because for every PR, it compiles every single NuttX Build Config: Arm, RISC-V, Simulator. (Hosting charges won't be cheap)

TODO: We might need a quicker way to "fail fast" and prevent other CI Jobs from running? Which will reduce the number of Runners?

TODO: What if we could start earlier the CI Jobs that are impacted by the Modified Code in the PR? So if I modify something for Ox64 BL808 SBC, it should start the CI Job for ox64:nsh. If it fails, then don't bother with the rest of the Arm / RISC-V / Simulator jobs.

TODO: Suppose we need to throttle our GitHub Runners from 36 Runners down to 25 Runners (and cut costs). What would be the impact on NuttX CI Duration? Are there any tools for modeling the queueing duration?

CI Build for NuttX

Our Self-Hosted Runners: Do they work for NuttX CI Builds?

Here's the result: https://github.com/lupyuen3/runner-nuttx/actions

  • Fetch Source works OK on macOS Arm64

  • Most of the Linux Builds won't work on macOS Arm64 because they need Docker on Linux x64

  • Podman Docker on Linux x64 fails with this error. Might be a problem with Podman.

    Writing manifest to image destination
    Error: statfs /var/run/docker.sock: permission denied
    
  • Retested with Docker Engine, which fails with this error:

    permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.47/images/create?fromImage=ghcr.io%2Fapache%2Fnuttx%2Fapache-nuttx-ci-linux&tag=latest": dial unix /var/run/docker.sock: connect: permission denied
    

    We apply this Docker Fix.

    And it works yay! (2 hours on a 10-year-old MacBook Pro with Core i7)

  • Docker Website will throttle our downloading of Docker Images. If it gets too slow, cancel the GitHub Workflow and restart. Throttling will magically disappear.

  • Build macOS (macos / sim-01 / sim-02) on macOS Arm64: setup-python will hang because it's prompting for password. So we comment out setup-python.

    Run actions/setup-python@v5
    Installed versions
    Version 3.[8](https://github.com/lupyuen3/runner-nuttx/actions/runs/10589440489/job/29343575677#step:3:9) was not found in the local cache
    Version 3.8 is available for downloading
    Download from "https://github.com/actions/python-versions/releases/download/3.8.10-887[9](https://github.com/lupyuen3/runner-nuttx/actions/runs/10589440489/job/29343575677#step:3:10)978422/python-3.8.10-darwin-arm64.tar.gz"
    Extract downloaded archive
    /usr/bin/tar xz -C /Users/luppy/actions-runner2/_work/_temp/2e[13](https://github.com/lupyuen3/runner-nuttx/actions/runs/10589440489/job/29343575677#step:3:14)8b05-b7c9-4759-956a-7283af148721 -f /Users/luppy/actions-runner2/_work/_temp/792ffa3a-a28f-4443-91c8-0d81f55e422f
    Execute installation script
    Check if Python hostedtoolcache folder exist...
    Install Python binaries from prebuilt package
    

    Then it fails while downloading the toolchain

    + wget --quiet https://developer.arm.com/-/media/Files/downloads/gnu/13.2.rel1/binrel/arm-gnu-toolchain-13.2.rel1-darwin-x86_64-arm-none-eabi.tar.xz
    + xz -d arm-gnu-toolchain-13.2.rel1-darwin-x86_64-arm-none-eabi.tar.xz
    xz: arm-gnu-toolchain-13.2.rel1-darwin-x86_64-arm-none-eabi.tar.xz: Unexpected end of input
    

    Retry and it fails at objcopy sigh:

    + rm -f /Users/luppy/actions-runner3/_work/runner-nuttx/runner-nuttx/sources/tools/bintools/bin/objcopy
    + ln -s /usr/local/opt/binutils/bin/objcopy /Users/luppy/actions-runner3/_work/runner-nuttx/runner-nuttx/sources/tools/bintools/bin/objcopy
    + command objcopy --version
    + objcopy --version
    /Users/luppy/actions-runner3/_work/runner-nuttx/runner-nuttx/sources/nuttx/tools/ci/platforms/darwin.sh: line 93: objcopy: command not found
    

    TODO: Do we change the toolchain from x64 to Arm64?

Can we guesstimate the time to run a CI Build?

Just browse the GitHub Actions Log for the CI Build. See the Line Numbers? Every NuttX CI Build will have roughly 1,000 lines of log (by sheer coincidence). We can use this to guess the CI Build Duration.

Documentation Build for NuttX

Does it work for Documentation Build?

  • Documentation on macOS Arm64: Hangs at setup-python because it prompts for password:

    Run actions/setup-python@v5
    Installed versions
    Version 3.[8](https://github.com/lupyuen3/runner-nuttx/actions/runs/10589440489/job/29343575677#step:3:9) was not found in the local cache
    Version 3.8 is available for downloading
    Download from "https://github.com/actions/python-versions/releases/download/3.8.10-887[9](https://github.com/lupyuen3/runner-nuttx/actions/runs/10589440489/job/29343575677#step:3:10)978422/python-3.8.10-darwin-arm64.tar.gz"
    Extract downloaded archive
    /usr/bin/tar xz -C /Users/luppy/actions-runner2/_work/_temp/2e[13](https://github.com/lupyuen3/runner-nuttx/actions/runs/10589440489/job/29343575677#step:3:14)8b05-b7c9-4759-956a-7283af148721 -f /Users/luppy/actions-runner2/_work/_temp/792ffa3a-a28f-4443-91c8-0d81f55e422f
    Execute installation script
    Check if Python hostedtoolcache folder exist...
    Install Python binaries from prebuilt package
    

    And it won't work on macOS because it needs apt: workflows/doc.yml

        - name: Install LaTeX packages
          run: |
            sudo apt-get update -y
            sudo apt-get install -y \
              texlive-latex-recommended texlive-fonts-recommended \
              texlive-latex-base texlive-latex-extra latexmk texlive-luatex \
              fonts-freefont-otf xindy
  • Documentation on Linux Arm64: Fails at setup-python

    Run actions/setup-python@v5
    Installed versions
    Version 3.[8](https://github.com/lupyuen3/runner-nuttx/actions/runs/10590973119/job/29347607289#step:3:9) was not found in the local cache
    Error: The version '3.8' with architecture 'arm64' was not found for Debian 12.
    The list of all available versions can be found here: https://raw.githubusercontent.com/actions/python-versions/main/versions-manifest.json
    

    So we comment out setup-python. Then it fails with pip3 not found:

    pip3: command not found
    

    TODO: Switch to pipenv

  • Documentation on Linux x64: Fails with rmdir error

    Copying '/home/luppy/.gitconfig' to '/home/luppy/actions-runner/_work/_temp/8c370e2f-3f8f-4e01-b8f2-1ccb301640a1/.gitconfig'
    Temporarily overriding HOME='/home/luppy/actions-runner/_work/_temp/8c370e2f-3f8f-4e01-b8f2-1ccb301640a1' before making global git config changes
    Adding repository directory to the temporary git global config as a safe directory
    /usr/bin/git config --global --add safe.directory /home/luppy/actions-runner/_work/runner-nuttx/runner-nuttx
    Deleting the contents of '/home/luppy/actions-runner/_work/runner-nuttx/runner-nuttx'
    Error: File was unable to be removed Error: EACCES: permission denied, rmdir '/home/luppy/actions-runner/_work/runner-nuttx/runner-nuttx/buildartifacts/at32f437-mini'
    

    TODO: Check the rmdir directory

UTM Emulator for macOS Arm64

So NuttX CI works better with a huge x64 Ubuntu PC. Can we make macOS on Arm64 more useful?

  • Now testing UTM Emulator for macOS Arm64, to emulate Ubuntu x64 (because my MacBook Pro x64 is running too hot and slow).

    Here's our Emulated Ubuntu x64 24.04.1 LTS with 4GB RAM: Build for arm-01, Build for arm-02, Build for arm-03, Build for arm-04

    Does it work? Yes! How many hours? 4 hours! (Instead of 33 mins when hosted at GitHub)

    TODO: Do we run multiple Virtual Machines in macOS UTM?

  • Alternatively: Running a Self-Hosted Runner inside a Docker Container (Rancher Desktop) on macOS Arm64

    But Then: It becomes a Linux Arm64 Runner, not a Linux x64 Runner. Which won't work with our current NuttX CI Docker Image, which is x64 only.

    Unless: We create a Linux Arm64 Docker Image for NuttX CI? Like for Compiling RISC-V Platforms?

Fixes for Ubuntu x64

## TODO: Install Docker Engine: https://docs.docker.com/engine/install/ubuntu/
## TODO: Apply this fix: https://stackoverflow.com/questions/48957195/how-to-fix-docker-got-permission-denied-issue
## Note: podman won't work

## NuttX CI needs to save files in `/github`, so we create it
## TODO: How to give each runner its own `/github` folder? Do we mount in Docker?
mkdir -p $HOME/github/home
mkdir -p $HOME/github/workspace
sudo ln -s $HOME/github /github
ls -l /github/home

## TODO: Clean up after every job, then restart the runner
sudo rm -rf $HOME/actions-runner/_work/runner-nuttx
cd $HOME/actions-runner
./run.sh

## TODO: In case of timeout after 6 hours:
## Restart the Ubuntu Machine, because the tasks are still running in background!

Fixes for macOS Arm64

sudo mkdir /Users/runner
sudo chown $USER /Users/runner
sudo chgrp staff /Users/runner
ls -ld /Users/runner

## Maybe need pip?
brew install python

Ubuntu x64 Runner In Action

macOS Arm64 with UTM Emulation:

On a powerful Mac Mini (M2 Pro, 32 GB RAM): We can emulate an Intel i7 PC with 32 CPUs and 4 GB RAM (we don't need much RAM)

Screenshot 2024-08-29 at 10 08 07 PM

Screenshot 2024-08-29 at 10 08 25 PM

Ubuntu Disk Space in UTM VM needs to be big enough for NuttX Docker Image:

user@ubuntu-emu-arm64:~$ neofetch
            .-/+oossssoo+/-.               user@ubuntu-emu-arm64
        `:+ssssssssssssssssss+:`           ---------------------
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 24.04.1 LTS x86_64
    .ossssssssssssssssssdMMMNysssso.       Host: KVM/QEMU (Standard PC (Q35 + ICH9, 2009) pc-q35-7.2)
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Kernel: 6.8.0-41-generic
  +ssssssssshmydMMMMMMMNddddyssssssss+     Uptime: 1 min
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Packages: 1546 (dpkg), 10 (snap)
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Shell: bash 5.2.21
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Resolution: 1280x800
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   Terminal: /dev/pts/1
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   CPU: Intel i7 9xx (Nehalem i7, IBRS update) (16) @ 1.000GHz
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   GPU: 00:02.0 Red Hat, Inc. Virtio 1.0 GPU
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Memory: 1153MiB / 3907MiB
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/
  +sssssssssdmydMMMMMMMMddddyssssssss+
   /ssssssssssshdmNNNNmyNMMMMhssssss/
    .ossssssssssssssssssdMMMNysssso.
      -+sssssssssssssssssyyyssss+-
        `:+ssssssssssssssssss+:`
            .-/+oossssoo+/-.

user@ubuntu-emu-arm64:~$ df -H
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           410M  1.7M  409M   1% /run
/dev/sda2        67G   31G   33G  49% /
tmpfs           2.1G     0  2.1G   0% /dev/shm
tmpfs           5.3M  8.2k  5.3M   1% /run/lock
efivarfs        263k   57k  201k  23% /sys/firmware/efi/efivars
/dev/sda1       1.2G  6.5M  1.2G   1% /boot/efi
tmpfs           410M  115k  410M   1% /run/user/1000

During Download Source Artifact: GitHub seems to be throttling the download (total 700 MB over 25 mins)

Screenshot 2024-08-29 at 2 09 11 PM

During Run Builds: CPU hits 100%

Screenshot 2024-08-29 at 4 56 06 PM

Note: Don't leave System Monitor running, it consumes quite a bit of CPU!

Why emulate 32 CPUs? That's because we want to max out the macOS Arm64 CPU Utilisation. Here's our chance to watch Mac Mini run smokin' hot!

Screenshot 2024-08-30 at 4 14 05 PM

Screenshot 2024-08-30 at 4 14 20 PM

Screenshot 2024-08-29 at 10 43 39 PM

Here's how it runs:

user@ubuntu-emu-arm64:~$ cd actions-runner/
user@ubuntu-emu-arm64:~/actions-runner$ sudo rm -rf _work/runner-nuttx
[sudo] password for user: 
user@ubuntu-emu-arm64:~/actions-runner$ df -H
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           410M  1.7M  408M   1% /run
/dev/sda2        67G   28G   35G  45% /
tmpfs           2.1G     0  2.1G   0% /dev/shm
tmpfs           5.3M  8.2k  5.3M   1% /run/lock
efivarfs        263k  130k  128k  51% /sys/firmware/efi/efivars
/dev/sda1       1.2G  6.5M  1.2G   1% /boot/efi
tmpfs           410M  119k  410M   1% /run/user/1000
user@ubuntu-emu-arm64:~/actions-runner$ ./run.sh 

\u221a Connected to GitHub
Current runner version: '2.319.1'
2024-08-30 02:33:17Z: Listening for Jobs
2024-08-30 02:33:23Z: Running job: Linux (arm-04)
2024-08-30 06:47:38Z: Job Linux (arm-04) completed with result: Succeeded
2024-08-30 06:47:43Z: Running job: Linux (arm-01)

Runner Options:

$ ./run.sh --help
Commands:
 ./config.sh         Configures the runner
 ./config.sh remove  Unconfigures the runner
 ./run.sh            Runs the runner interactively. Does not require any options.

Options:
 --help     Prints the help for each command
 --version  Prints the runner version
 --commit   Prints the runner commit
 --check    Check the runner's network connectivity with GitHub server

Config Options:
 --unattended           Disable interactive prompts for missing arguments. Defaults will be used for missing options
 --url string           Repository to add the runner to. Required if unattended
 --token string         Registration token. Required if unattended
 --name string          Name of the runner to configure (default ubuntu-emu-arm64)
 --runnergroup string   Name of the runner group to add this runner to (defaults to the default runner group)
 --labels string        Custom labels that will be added to the runner. This option is mandatory if --no-default-labels is used.
 --no-default-labels    Disables adding the default labels: 'self-hosted,Linux,X64'
 --local                Removes the runner config files from your local machine. Used as an option to the remove command
 --work string          Relative runner work directory (default _work)
 --replace              Replace any existing runner with the same name (default false)
 --pat                  GitHub personal access token with repo scope. Used for checking network connectivity when executing `./run.sh --check`
 --disableupdate        Disable self-hosted runner automatic update to the latest released version`
 --ephemeral            Configure the runner to only take one job and then let the service un-configure the runner after the job finishes (default false)

Examples:
 Check GitHub server network connectivity:
  ./run.sh --check --url <url> --pat <pat>
 Configure a runner non-interactively:
  ./config.sh --unattended --url <url> --token <token>
 Configure a runner non-interactively, replacing any existing runner with the same name:
  ./config.sh --unattended --url <url> --token <token> --replace [--name <name>]
 Configure a runner non-interactively with three extra labels:
  ./config.sh --unattended --url <url> --token <token> --labels L1,L2,L3
Runner listener exit with 0 return code, stop the service, no retry needed.
Exiting runner...

MacBook Pro x64 with Ubuntu:

Ubuntu CPU on MacBook Pro hits 100% when running the Linux Build for NuttX CI: Build for arm-02

linux-build

@lupyuen lupyuen marked this pull request as draft August 28, 2024 01:18
@lupyuen lupyuen closed this Aug 28, 2024
@lupyuen lupyuen reopened this Aug 28, 2024
@lupyuen lupyuen changed the title runner updates Running NuttX CI with Self-Hosted Runners Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant