Releasing Keep and tBTC involves executing a mix of automated and manual tasks in a specific order. Release tasks span multiple Github repositories and software tools. A full release requires approximately 8 hours of work for an individual already familiar with the tasks, task ordering, and nuance of executing a full release.
The number of moving pieces and state at any given part of the release process is non-trivial, susceptible to human error, with a high probability of having to "re-run" certain parts of the release before you can move onto the next step.
While this RFC will aim to outline potential steps for minimizing the above (less than ideal) conditions, it’s important to note that they will still exist to some degree regardless of what we do. This is a big complicated ship, moving it requires coordination across teams, repos, modules, and the process itself. Releasing is beholden to external systems we do not control.
How we got to this point is likely a point of debate, however the defining factor is that we did not account for the complexity being inherited into the release process as we began integrating various parts of the Keep and tBTC systems. As these systems came together we forged ahead bolting more complicated scripts and process onto existing build pipelines to handle the orchestration between now interdependent modules. To complicate this further we scripted some parts of orchestration related to module interdependence and not others, resulting in the need for manual intervention at various steps of release.
It’s arguable that our situation was a double-edged sword. At the outset systems were rapidly evolving independently of each other, the build and release process was structured in a way to accommodate this environment. As integrations started playing out it would have been challenging to re-engineer the release process at the same time. Now that the system is largely complete with better understanding of module interdependence, it’s a good time to have a look at the release process to see what we can improve upon.
Repositories sourced during release of the system.
-
keep-common
-
keep-core
-
keep-ecdsa
-
tbtc
-
tbtc.js
-
tbtc-dapp
-
sortition-pools
-
summa-tx/relays (fork)
We use a myriad of software tools to facilitate releases. Configurations for these tools are spread out over the Github repositories documented above.
Tool | Function |
---|---|
Github |
Github releases/tags to capture release commits. |
Github Actions |
Run tasks related to packing NPM artifacts and publishing them. |
CircleCI |
Run tasks related to migrating contracts against an Ethereum network, packaging NPM artifacts, building application client Docker images, and publishing them. |
Truffle |
Compile contracts and execute migrations against an Ethereum network |
NPM |
Package contract artifacts after compilation |
Docker Hub |
Docker images produced as part of a CircleCI build are published here for public distribution. |
NPM Registry |
NPM packages produced as part of a CircleCI build or Github Action workflow execution are published here for public distribution. |
Terminal |
Access for GCP/Kube resources. |
gcloud CLI |
Deploy GCP resources directly, e.g. faucet. |
kubectl CLI |
Deploy services living in a Kube cluster. |
Viscosity |
VPN access for deploying services/applications in Kube cluster. |
We use semantic versioning with the additional distinction of, Development (x.x.x-pre) → Testnet (x.x.x-rc) → Mainnet (x.x.x).
Not all modules that should be explicitly versioned are. See below table for modules involved with a full release and their versioning status:
Repository | Module | Versioned | Location |
---|---|---|---|
keep-common |
go library |
yes |
repository tag |
keep-core |
contracts |
yes |
package.json |
keep-core |
go client |
yes |
repository tag |
keep-core |
token-dashboard (js client) |
yes |
package.json |
keep-ecdsa |
contracts |
yes |
package.json |
keep-ecdsa |
go client |
yes |
repository tag |
sortition-pools |
contracts |
yes |
package.json |
tbtc |
contracts |
yes |
package.json |
tbtc.js |
library |
yes |
package.json |
tbtc-dapp |
js client |
yes |
package.json |
Here we’ll step through a sample flow for a release. Please note the top path of the flowchart is exercised for each repo being released, e.g. keep-core and keep-ecdsa would go through this top portion of the flowchart, independently.
Another non-obvious factor to note here is that when a repo houses both contract and go client code (e.g. keep-core), the current release process does not allow you to release just the client, should no contract changes be made. This results in a release flow that requires both contract and client build/deploys regardless of what changed.
One complicating factor for releasing is that certain modules are dependent
on each other, and dependent modules span code repositories. Upstream context
for module changes aren’t sent downstream during release, these changes are
accounted for manually by a person making sure package.json
files are
versioned properly, or by scripts that assume changes were made and fetch the
latest contract artifacts from a Google storage bucket. This is a
particularly hairy failure point as mistakes can go unaccounted for until
deploy time, resulting in potentially having to redo large parts of the
release. Handling these interdependencies programmatically, and with specific
context of the upstream state as to minimize downstream work is critical to
having a more robust, less error prone, and shorter release process.
To illustrate: tbtc contract initialization depends on having the keep-ecdsa
BondedECDSAKeepFactory
contract address for the environment being released.
The BondedECDSAKeepFactory
address is produced outside of the release context
that tbtc has and therefore must be passed explicitly to the tbtc release
context before tbtc contract initialization can happen. The
BondedECDSAKeepFactory
address fetch currently happens via a script executed
during the tbtc CircleCI migration job. The script fetches relevant
keep-ecdsa contract artifacts from a Google storage bucket that is assumed to
be up to date by an earlier process run during the keep-ecdsa release, that
publishes contract artifacts to this storage bucket. The fetch script has no
way of knowing if the contract artifact is the correct one for the release
being run.
The next iteration of our release process should incorporate learnings from our current process and stay flexible for the future. To achieve this we’re going to have to touch all aspects of the current process, including: versioning, tooling, and the release flow itself.
Any part of the system should be releasable at any time, with modules that have upstream dependencies being aware of upstream module release and having their own release initiated in response, at the appropriate time.
For a properly modular release process, each module in the system should be able to build and publish its own artifacts where relevant, and each module should be able to trigger any downstream builds as necessary, irrespective of whether they share the same repository or not. To do this, contract releases, client releases, and dApp releases must be decoupled. Each one should be runnable independently of the others against any environment configured for deployment. Additionally, modules themselves must be independently versionable. dApps must not be versioned in lockstep with clients, contracts must not be versioned in lockstep with dApps, etc. Each of these should be able to live as independently evolving modules, from a version perspective.
To properly manage contract deployment across various Ethereum testnets and mainnet, and to reflect the fact that deploying contracts to mainnet is not necessarily a declaration of a final version, artifact versions must be able to differentiate between pre-releases (whether RC or otherwise) on mainnet vs testnets. Versions that are declared final may be deployed to a testnet, and versions that are declared pre-release may be deployed to mainnet.
The implementation presented here has two primary parts:
-
Build tagging/publishing.
-
Inter-module dependency management.
The proposal is to move to GitHub Actions for all builds, and set up one build per module. All GitHub Actions builds would be set up to trigger on three events:
-
push
tomaster
, to run merge builds -
pull_request
, to run PR builds -
workflow_dispatch
with some additional parameters to publish artifacts, described more below
Each GitHub Actions module build should have migrate
/publish
steps (in
Actions parlance, job
s) that run conditionally; these jobs should run only
as a result of the workflow_dispatch
event, not for pushes or pull requests.
All module builds with versions that are not derived from the tag (e.g. npm, but not necessarily Go) will receive a new version for each build, even if the build goes unpublished. Versions will follow a consistent (and semver-compatible) format across all artifact types that support it:
<base-version>-<environment>.<build-number>+<branch>.<commit>
The components are defined as follows:
base-version
-
A manually-incremented
major.minor.patch
semver version that is committed to the repository. Bumping this version requires a human committing an incremented version. environment
-
The environment we are building for; for builds that will not publish, the environment should match the development environment used for builds that publish to the development testnet.
build-number
-
Auto-incremented for each published build. This means that builds that don’t publish can produce several builds under a single build number until a build is published.
branch
-
Unless a build is unattached to a branch, this should be the branch for the build. For example, for pushes to
master
, this will bemaster
. For PR builds, this will be the PR branch. commit
-
The commit hash for the current build.
Versions that are deployed to mainnet are special; these carry only the
<base-version>
with an optional -rc.<build-number>
for a mainnet
deployment considered a release candidate. Mainnet versions are also
currently deployed manually, so publishing is not expected to occur in the
automated pipeline for these versions.
Tagging for modules is based on their path in the repository and the version
of the module. These versions do not need to be correlated between modules.
As an example, consider the repository at github.com/keep-network/keep-core
.
It has three modules: the Go client, the Solidity contracts, and the Keep
token dashboard. A 1.0.0 version for each of these modules would be tagged as
v1.0.0
(Go client), solidity/v1.0.0
(Solidity contracts), and
token-dashboard/v1.0.0
(token dashboard).
In addition to being clear on which module is being tagged, this also happens to follow with go mod‘s module versioning strategy, so it will be forward-compatible to managing multiple Go modules in one repository if that should become necessary.
Publishing (below) should result in the creation of a tag on the repository
if the generated artifacts are expected to be consumed downstream or by third
parties. A tag doesn’t necessarily need to be created for internal modules
such as Kubernetes InitContainer
s.
As discussed in the Builds section, publishing will be implemented as an optional last step/job in the regular build process. Publishing should include any artifacts that may arise out of the build; in particular, here are the major expected artifacts:
- Go builds
-
The Go client executable and a docker image with same.
- Solidity builds
-
The npm package with JSON artifacts, including deployed artifact information.
- JS libraries
-
The npm package.
- dApp builds
-
The static site publish; this may not necessarily result in an npm package or other directly downloadable artifact beyond a reachable URL.
- InitContainer builds
-
A docker image.
For contract builds, publishing should involve deploying as well. Whether an unpublished build includes deployment or not is a module-specific decision.
Publishing is triggered by the workflow_dispatch
event. These events can be
dispatched
manually, or they can be dispatched via API calls. For the purposes of this
RFC, the expectation is that first event in a chain will be dispatched
manually, and downstream builds will be fired
via
API through intermediary Actions (see the Inter-module dependency management
below). For a module build, the workflow_dispatch
event should
expect two parameters:
environment
-
The environment to run the build for. This corresponds to either a public testnet name (e.g.
ropsten
) or an internal environment name.mainnet
is currently not a valid build identifier, as mainnet builds are currently run manually. upstream_builds
-
A JSON array of upstream build information, in order from the original triggering event to the build that triggered this
workflow_dispatch
. The format of each object is described in the next section.
Additionally, workflow_dispatch
events are triggered on a particular ref
,
which is considered part of the input to the publish as well.
After a publish is successfully completed, downstream builds should be dispatched per the following section.
Inter-module dependency management is handled by a repository that tracks
dependencies and handles inter-module and inter-repository coordination. This
repository could be an evolution of the keep-network/local-setup
repository,
which currently manages interdependencies for local setup purposes; it could
be one or more Actions on another existing repository such as
keep-network/keep-core
; or it could be a separate repository altogether.
The repository will have a single entry point for inter-module builds, a
GitHub Action triggered by a workflow_dispatch
event. This event will
expect two parameters:
environment
-
The environment to run the build for. Same as for a module build, this corresponds to either a public testnet name (e.g. ropsten) or an internal environment name. mainnet is currently not a valid build identifier, as mainnet builds are currently run manually.
upstream_builds
-
A JSON array of upstream build information, in order from the original triggering event to the build that triggered this
workflow_dispatch
.
Entries in upstream_builds
will have these properties:
url
-
A URL that points to the GitHub Action run in-browser.
ref
-
The ref used for this build.
module
-
The name of the module that was built, including the repository (e.g.
github.com/keep-network/keep-core/solidity-v1
). version
-
The module version used for this build.
Downstream builds will be triggered by invoking a workflow_dispatch
event
on their containing repository. The name provided for the event will be the
name of the build’s module followed by .yaml
. Note that this imposes a
restriction on module build Action names: the GitHub Action associated with a
module build should match the name of the module, which should in turn match
the name of the directory the module is in. In cases where a module is nested
beyond one level in the repository, the file should be the full path, with /
replaced by -
.
When invoking workflow_dispatch
on a module, the event passed to that
module build will have these properties:
environment
-
The environment to run the build for. Same as for a module build, this corresponds to either a public testnet name (e.g. ropsten) or an internal environment name. mainnet is currently not a valid build identifier, as mainnet builds are currently run manually. This is the same as the parameter passed to module builds.
upstream-ref
-
The ref used to trigger the upstream build. Note that this can and often will differ from the Action’s own
ref
, which will generally bemaster
for the dependency management repository (since typical builds will use themaster
branch’s dependency management configuration and Action). upstream_builds
-
A JSON array of upstream build information, in order from the original triggering event to the build that triggered this
workflow_dispatch
. This is the same as the parameter passed to module builds.
The upstream_builds
array’s latest entry will be used by the dependency
management Action to determine where in the dependency graph the upstream
build is, and trigger the appropriate downstream module builds.
This dependency management repository should track the dependencies between builds in a way that makes them easy to resolve at runtime. A proposal would look something like this:
{
"github.com/keep-network/keep-core/solidity-v1": [
"github.com/keep-network/keep-core",
"github.com/keep-network/keep-core/token-dashboard"
],
"github.com/keep-network/keep-core": [
"github.com/keep-network/keep-ecdsa",
"github.com/keep-network/keep-ecdsa/solidity"
],
"github.com/keep-network/keep-ecdsa/solidity": [
"github.com/keep-network/tbtc/solidity"
],
"github.com/keep-network/tbtc/solidity": [
"github.com/keep-network/tbtc/relay-maintainer-initcontainer",
"github.com/keep-network/tbtc.js"
],
"github.com/keep-network/tbtc.js": [
"github.com/keep-network/tbtc-dapp",
"github.com/keep-network/tbtc.js/liquidation-maintainer"
]
}
Here, we use the same module reference structure that we use for versioning
(the path to the module) and define downstream build dependencies, all as
JSON. When the workflow_dispatch
event is received in the dependency
management repository, it checks the last entry in upstream_builds
(see
below) and checks its module property against this dependency definition to
trigger the appropriate builds, done by calling the GitHub API to trigger
workflow_dispatch
events on those repositories in turn. Note that these
builds can fan out, and this RFC does not define a "fan-in" way to trigger a
downstream only when multiple upstreams have completed.
While the above proposal covers both the build/tag/publish process and defines a way to manage inter-module builds centrally, across repositories and versions, there are a few limitations to the detailed approach and a few things that are explicitly left out:
-
Automated mainnet releases. Mainnet releases and upgrades currently require manual coordination with both internal and external entities and are still a topic of exploration for the team, so they are left for future work.
-
Downstream builds are blocked from starting by upstream deploys. Could be a target of future work.
-
Tracing of failures is not always straightforward: because the coordinating repository relies on dispatching builds on repos and having them call back to the coordinating repository, tracing a failure may require looking at several repositories to see where a build originated. The
upstream_builds
argument should help with this, but errors can still happen in unexpected places and require tracing across repositories. -
The dependency management repository can trigger fanned out builds. These builds will not track all
upstream_builds
entries, and could result in a partial downstream view of the overall build graph. Additionally, there is no way to specify a "fan-in" where a downstream build requires multiple upstream builds to complete. -
Contract artifacts will still be bundled with contract code dependencies. This means that a new deployment requires new artifacts and therefore a new npm package, and that one deployment cannot be pointed to multiple environments.
-
Publishing builds cannot be triggered manually without navigating the GitHub Actions UI
-
Possible fix: Heimdall can be updated to support chat- and GitHub-comment-based invocations of builds.
-
-
When running an Action module build from a
workflow_dispatch
event, looking up the prior published artifact and checking whether there have been changes between its commit id and the current commit could be used to skip the build altogether and go straight to triggering downstream dependencies. -
In general, breaking the jobs down in the module builds such that rebuilds can be partial would allow avoiding repetition of certain slower processes in cases where they need not be repeated for a rebuild. It’s possible the module delimiting will be enough to handle this.