Skip to content

Latest commit

 

History

History
490 lines (399 loc) · 21.7 KB

rfc-18-release-management.adoc

File metadata and controls

490 lines (399 loc) · 21.7 KB

RFC 18: Release Management

1. Background

Releasing Keep and tBTC involves executing a mix of automated and manual tasks in a specific order. Release tasks span multiple Github repositories and software tools. A full release requires approximately 8 hours of work for an individual already familiar with the tasks, task ordering, and nuance of executing a full release.

The number of moving pieces and state at any given part of the release process is non-trivial, susceptible to human error, with a high probability of having to "re-run" certain parts of the release before you can move onto the next step.

While this RFC will aim to outline potential steps for minimizing the above (less than ideal) conditions, it’s important to note that they will still exist to some degree regardless of what we do. This is a big complicated ship, moving it requires coordination across teams, repos, modules, and the process itself. Releasing is beholden to external systems we do not control.

How we got to this point is likely a point of debate, however the defining factor is that we did not account for the complexity being inherited into the release process as we began integrating various parts of the Keep and tBTC systems. As these systems came together we forged ahead bolting more complicated scripts and process onto existing build pipelines to handle the orchestration between now interdependent modules. To complicate this further we scripted some parts of orchestration related to module interdependence and not others, resulting in the need for manual intervention at various steps of release.

It’s arguable that our situation was a double-edged sword. At the outset systems were rapidly evolving independently of each other, the build and release process was structured in a way to accommodate this environment. As integrations started playing out it would have been challenging to re-engineer the release process at the same time. Now that the system is largely complete with better understanding of module interdependence, it’s a good time to have a look at the release process to see what we can improve upon.

1.1. Current Functionality

1.1.1. Code Repositories

Repositories sourced during release of the system.

  1. keep-common

  2. keep-core

  3. keep-ecdsa

  4. tbtc

  5. tbtc.js

  6. tbtc-dapp

  7. sortition-pools

  8. summa-tx/relays (fork)

1.1.2. Tooling

We use a myriad of software tools to facilitate releases. Configurations for these tools are spread out over the Github repositories documented above.

Table 1. Tools
Tool Function

Github

Github releases/tags to capture release commits.

Github Actions

Run tasks related to packing NPM artifacts and publishing them.

CircleCI

Run tasks related to migrating contracts against an Ethereum network, packaging NPM artifacts, building application client Docker images, and publishing them.

Truffle

Compile contracts and execute migrations against an Ethereum network

NPM

Package contract artifacts after compilation

Docker Hub

Docker images produced as part of a CircleCI build are published here for public distribution.

NPM Registry

NPM packages produced as part of a CircleCI build or Github Action workflow execution are published here for public distribution.

Terminal

Access for GCP/Kube resources.

gcloud CLI

Deploy GCP resources directly, e.g. faucet.

kubectl CLI

Deploy services living in a Kube cluster.

Viscosity

VPN access for deploying services/applications in Kube cluster.

1.1.3. Versioning

We use semantic versioning with the additional distinction of, Development (x.x.x-pre) → Testnet (x.x.x-rc) → Mainnet (x.x.x).

Not all modules that should be explicitly versioned are. See below table for modules involved with a full release and their versioning status:

Table 2. Versions
Repository Module Versioned Location

keep-common

go library

yes

repository tag

keep-core

contracts

yes

package.json

keep-core

go client

yes

repository tag

keep-core

token-dashboard (js client)

yes

package.json

keep-ecdsa

contracts

yes

package.json

keep-ecdsa

go client

yes

repository tag

sortition-pools

contracts

yes

package.json

tbtc

contracts

yes

package.json

tbtc.js

library

yes

package.json

tbtc-dapp

js client

yes

package.json

1.1.4. General Release Flow

Here we’ll step through a sample flow for a release. Please note the top path of the flowchart is exercised for each repo being released, e.g. keep-core and keep-ecdsa would go through this top portion of the flowchart, independently.

Another non-obvious factor to note here is that when a repo houses both contract and go client code (e.g. keep-core), the current release process does not allow you to release just the client, should no contract changes be made. This results in a release flow that requires both contract and client build/deploys regardless of what changed.

Flow chart of the current release process

1.1.5. Interdependencies

One complicating factor for releasing is that certain modules are dependent on each other, and dependent modules span code repositories. Upstream context for module changes aren’t sent downstream during release, these changes are accounted for manually by a person making sure package.json files are versioned properly, or by scripts that assume changes were made and fetch the latest contract artifacts from a Google storage bucket. This is a particularly hairy failure point as mistakes can go unaccounted for until deploy time, resulting in potentially having to redo large parts of the release. Handling these interdependencies programmatically, and with specific context of the upstream state as to minimize downstream work is critical to having a more robust, less error prone, and shorter release process.

To illustrate: tbtc contract initialization depends on having the keep-ecdsa BondedECDSAKeepFactory contract address for the environment being released. The BondedECDSAKeepFactory address is produced outside of the release context that tbtc has and therefore must be passed explicitly to the tbtc release context before tbtc contract initialization can happen. The BondedECDSAKeepFactory address fetch currently happens via a script executed during the tbtc CircleCI migration job. The script fetches relevant keep-ecdsa contract artifacts from a Google storage bucket that is assumed to be up to date by an earlier process run during the keep-ecdsa release, that publishes contract artifacts to this storage bucket. The fetch script has no way of knowing if the contract artifact is the correct one for the release being run.

2. Proposal

The next iteration of our release process should incorporate learnings from our current process and stay flexible for the future. To achieve this we’re going to have to touch all aspects of the current process, including: versioning, tooling, and the release flow itself.

Any part of the system should be releasable at any time, with modules that have upstream dependencies being aware of upstream module release and having their own release initiated in response, at the appropriate time.

2.1. Goal

For a properly modular release process, each module in the system should be able to build and publish its own artifacts where relevant, and each module should be able to trigger any downstream builds as necessary, irrespective of whether they share the same repository or not. To do this, contract releases, client releases, and dApp releases must be decoupled. Each one should be runnable independently of the others against any environment configured for deployment. Additionally, modules themselves must be independently versionable. dApps must not be versioned in lockstep with clients, contracts must not be versioned in lockstep with dApps, etc. Each of these should be able to live as independently evolving modules, from a version perspective.

To properly manage contract deployment across various Ethereum testnets and mainnet, and to reflect the fact that deploying contracts to mainnet is not necessarily a declaration of a final version, artifact versions must be able to differentiate between pre-releases (whether RC or otherwise) on mainnet vs testnets. Versions that are declared final may be deployed to a testnet, and versions that are declared pre-release may be deployed to mainnet.

2.2. Implementation

The implementation presented here has two primary parts:

  • Build tagging/publishing.

  • Inter-module dependency management.

2.2.1. Build tagging/publishing

Builds

The proposal is to move to GitHub Actions for all builds, and set up one build per module. All GitHub Actions builds would be set up to trigger on three events:

  • push to master, to run merge builds

  • pull_request, to run PR builds

  • workflow_dispatch with some additional parameters to publish artifacts, described more below

Each GitHub Actions module build should have migrate/publish steps (in Actions parlance, jobs) that run conditionally; these jobs should run only as a result of the workflow_dispatch event, not for pushes or pull requests.

Versioning

All module builds with versions that are not derived from the tag (e.g. npm, but not necessarily Go) will receive a new version for each build, even if the build goes unpublished. Versions will follow a consistent (and semver-compatible) format across all artifact types that support it:

    <base-version>-<environment>.<build-number>+<branch>.<commit>

The components are defined as follows:

base-version

A manually-incremented major.minor.patch semver version that is committed to the repository. Bumping this version requires a human committing an incremented version.

environment

The environment we are building for; for builds that will not publish, the environment should match the development environment used for builds that publish to the development testnet.

build-number

Auto-incremented for each published build. This means that builds that don’t publish can produce several builds under a single build number until a build is published.

branch

Unless a build is unattached to a branch, this should be the branch for the build. For example, for pushes to master, this will be master. For PR builds, this will be the PR branch.

commit

The commit hash for the current build.

Versions that are deployed to mainnet are special; these carry only the <base-version> with an optional -rc.<build-number> for a mainnet deployment considered a release candidate. Mainnet versions are also currently deployed manually, so publishing is not expected to occur in the automated pipeline for these versions.

Tagging

Tagging for modules is based on their path in the repository and the version of the module. These versions do not need to be correlated between modules. As an example, consider the repository at github.com/keep-network/keep-core. It has three modules: the Go client, the Solidity contracts, and the Keep token dashboard. A 1.0.0 version for each of these modules would be tagged as v1.0.0 (Go client), solidity/v1.0.0 (Solidity contracts), and token-dashboard/v1.0.0 (token dashboard).

In addition to being clear on which module is being tagged, this also happens to follow with go mod‘s module versioning strategy, so it will be forward-compatible to managing multiple Go modules in one repository if that should become necessary.

Publishing (below) should result in the creation of a tag on the repository if the generated artifacts are expected to be consumed downstream or by third parties. A tag doesn’t necessarily need to be created for internal modules such as Kubernetes InitContainers.

Publishing

As discussed in the Builds section, publishing will be implemented as an optional last step/job in the regular build process. Publishing should include any artifacts that may arise out of the build; in particular, here are the major expected artifacts:

Go builds

The Go client executable and a docker image with same.

Solidity builds

The npm package with JSON artifacts, including deployed artifact information.

JS libraries

The npm package.

dApp builds

The static site publish; this may not necessarily result in an npm package or other directly downloadable artifact beyond a reachable URL.

InitContainer builds

A docker image.

For contract builds, publishing should involve deploying as well. Whether an unpublished build includes deployment or not is a module-specific decision.

workflow_dispatch event

Publishing is triggered by the workflow_dispatch event. These events can be dispatched manually, or they can be dispatched via API calls. For the purposes of this RFC, the expectation is that first event in a chain will be dispatched manually, and downstream builds will be fired via API through intermediary Actions (see the Inter-module dependency management below). For a module build, the workflow_dispatch event should expect two parameters:

environment

The environment to run the build for. This corresponds to either a public testnet name (e.g. ropsten) or an internal environment name. mainnet is currently not a valid build identifier, as mainnet builds are currently run manually.

upstream_builds

A JSON array of upstream build information, in order from the original triggering event to the build that triggered this workflow_dispatch. The format of each object is described in the next section.

Additionally, workflow_dispatch events are triggered on a particular ref, which is considered part of the input to the publish as well.

After a publish is successfully completed, downstream builds should be dispatched per the following section.

2.2.2. Inter-module dependency management

Inter-module dependency management is handled by a repository that tracks dependencies and handles inter-module and inter-repository coordination. This repository could be an evolution of the keep-network/local-setup repository, which currently manages interdependencies for local setup purposes; it could be one or more Actions on another existing repository such as keep-network/keep-core; or it could be a separate repository altogether.

The repository will have a single entry point for inter-module builds, a GitHub Action triggered by a workflow_dispatch event. This event will expect two parameters:

environment

The environment to run the build for. Same as for a module build, this corresponds to either a public testnet name (e.g. ropsten) or an internal environment name. mainnet is currently not a valid build identifier, as mainnet builds are currently run manually.

upstream_builds

A JSON array of upstream build information, in order from the original triggering event to the build that triggered this workflow_dispatch.

Entries in upstream_builds will have these properties:

url

A URL that points to the GitHub Action run in-browser.

ref

The ref used for this build.

module

The name of the module that was built, including the repository (e.g. github.com/keep-network/keep-core/solidity-v1).

version

The module version used for this build.

Downstream builds will be triggered by invoking a workflow_dispatch event on their containing repository. The name provided for the event will be the name of the build’s module followed by .yaml. Note that this imposes a restriction on module build Action names: the GitHub Action associated with a module build should match the name of the module, which should in turn match the name of the directory the module is in. In cases where a module is nested beyond one level in the repository, the file should be the full path, with / replaced by -.

When invoking workflow_dispatch on a module, the event passed to that module build will have these properties:

environment

The environment to run the build for. Same as for a module build, this corresponds to either a public testnet name (e.g. ropsten) or an internal environment name. mainnet is currently not a valid build identifier, as mainnet builds are currently run manually. This is the same as the parameter passed to module builds.

upstream-ref

The ref used to trigger the upstream build. Note that this can and often will differ from the Action’s own ref, which will generally be master for the dependency management repository (since typical builds will use the master branch’s dependency management configuration and Action).

upstream_builds

A JSON array of upstream build information, in order from the original triggering event to the build that triggered this workflow_dispatch. This is the same as the parameter passed to module builds.

The upstream_builds array’s latest entry will be used by the dependency management Action to determine where in the dependency graph the upstream build is, and trigger the appropriate downstream module builds.

This dependency management repository should track the dependencies between builds in a way that makes them easy to resolve at runtime. A proposal would look something like this:

{
    "github.com/keep-network/keep-core/solidity-v1": [
        "github.com/keep-network/keep-core",
        "github.com/keep-network/keep-core/token-dashboard"
    ],
    "github.com/keep-network/keep-core": [
        "github.com/keep-network/keep-ecdsa",
        "github.com/keep-network/keep-ecdsa/solidity"
    ],
    "github.com/keep-network/keep-ecdsa/solidity": [
        "github.com/keep-network/tbtc/solidity"
    ],
    "github.com/keep-network/tbtc/solidity": [
        "github.com/keep-network/tbtc/relay-maintainer-initcontainer",
        "github.com/keep-network/tbtc.js"
    ],
    "github.com/keep-network/tbtc.js": [
        "github.com/keep-network/tbtc-dapp",
        "github.com/keep-network/tbtc.js/liquidation-maintainer"
    ]
}

Here, we use the same module reference structure that we use for versioning (the path to the module) and define downstream build dependencies, all as JSON. When the workflow_dispatch event is received in the dependency management repository, it checks the last entry in upstream_builds (see below) and checks its module property against this dependency definition to trigger the appropriate builds, done by calling the GitHub API to trigger workflow_dispatch events on those repositories in turn. Note that these builds can fan out, and this RFC does not define a "fan-in" way to trigger a downstream only when multiple upstreams have completed.

2.3. Limitations

While the above proposal covers both the build/tag/publish process and defines a way to manage inter-module builds centrally, across repositories and versions, there are a few limitations to the detailed approach and a few things that are explicitly left out:

  • Automated mainnet releases. Mainnet releases and upgrades currently require manual coordination with both internal and external entities and are still a topic of exploration for the team, so they are left for future work.

  • Downstream builds are blocked from starting by upstream deploys. Could be a target of future work.

  • Tracing of failures is not always straightforward: because the coordinating repository relies on dispatching builds on repos and having them call back to the coordinating repository, tracing a failure may require looking at several repositories to see where a build originated. The upstream_builds argument should help with this, but errors can still happen in unexpected places and require tracing across repositories.

  • The dependency management repository can trigger fanned out builds. These builds will not track all upstream_builds entries, and could result in a partial downstream view of the overall build graph. Additionally, there is no way to specify a "fan-in" where a downstream build requires multiple upstream builds to complete.

  • Contract artifacts will still be bundled with contract code dependencies. This means that a new deployment requires new artifacts and therefore a new npm package, and that one deployment cannot be pointed to multiple environments.

  • Publishing builds cannot be triggered manually without navigating the GitHub Actions UI

    • Possible fix: Heimdall can be updated to support chat- and GitHub-comment-based invocations of builds.

3. Future Work

  • When running an Action module build from a workflow_dispatch event, looking up the prior published artifact and checking whether there have been changes between its commit id and the current commit could be used to skip the build altogether and go straight to triggering downstream dependencies.

  • In general, breaking the jobs down in the module builds such that rebuilds can be partial would allow avoiding repetition of certain slower processes in cases where they need not be repeated for a rebuild. It’s possible the module delimiting will be enough to handle this.