Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JupyterLab Development Cycle RFC #54

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
173 changes: 173 additions & 0 deletions rfc/development-cycle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@

# Jupyter Development Cycle

The goal of this document is to outline a development cycle for JupyterLab which makes it very easy to do releases. As [documented in this issue](https://github.com/jupyterlab/jupyterlab/issues/8195), it's currently a quite laborious and specialized process. If we can make releases easier to manage, and push any of the hard work around so it's tackled by the authors of the PRs instead of the release managers, we can release more easily and free up core developer time for other matters.

This is meant to support our existing release cycles, not change them, but just provide some standardization and tooling to aid us in keeping to them.

## Assumptions

We start with a list of assumptions/guarantees to constrain the possible space of solutions.

You can consider these in addition to the [SemVer 2.0.0](https://semver.org/) spec, having more to do with the relations between the code evolution between multiple versions. They assume we are already following SemVer.


**alpha, beta, rc**: Each release goes through some optional number of alpha, beta, and RC release, before the final. For some final release `{x}.{y}.{z}` it can be modeled with this state machine:

1. Start by releasing one of these:
1. `{x}.{y}.{z}a0`
2. `{x}.{y}.{z}b0`
3. `{x}.{y}.{z}rc0`
4. `{x}.{y}.{z}`
2. Then the next release can be either an increment of the last number or a release of one of the initial following your
release. For example after `{x}.{y}.{z}b10` you could release `{x}.{y}.{z}b11` or `{x}.{y}.{z}rc0` or `{x}.{y}.{z}` (final).

**Previous Ancestor**: The next increment will be the git ancestor of the previous increment.

* Patch: `{x}.{y}.{z}` is an ancestor of `{x}.{y}.{z+1}`.
* Minor: `{x}.{y}.0` is an ancestor of `{x}.{y+1}.0`.
* Major: `{x}.0.0` is an ancestor of `{y+1}.0.0`.

**Single Active**: No commits will be queued up for a release until the previous increment was fully released
(this also falls out of the above assumption).

* Patch: No commits towards `{x}.{y}.{z}` till `{x}.{y}.{z-1}` final is released.
* Minor: No commits towards `{x}.{y}.0` till `{x}.{y-1}.0` final is released.
* Major: No commits towards `{x}.0.0` till `{x-1}.0.0` final is released.

The last two means you have to totally finish a release on one branch before you start on the next one. For example, it would be illegal to have releases in this order: `1.2.0a0`, `1.3.0a0`, `1.2.0`, then `1.3.0`. However, you could have `1.2.0a0`, `1.1.1a0`, `1.2.0`, then `1.1.1`, because these would be on different branches, `1.x.0` and `1.1.x`.

**JavaScript versions In Sync** We are OK always keeping the JS version bumps in sync. Meaning that if we do a major release of the Python package we also do a major release of all JS versions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My major concern with this point is that it will lead to a lot of dead/outdated extensions. I've made a script to profile the state of all extensions with the keyword "jupyterlab-extension" on npm, and got these results:

Processed 589 extensions:

Up to date (193)

Outdated (379):
  Support ends at v2.x: 3
  Support ends at v1.x: 173
  Support ends at v0.x: 203
Deprecated (4)

Unclassified (13)

(the "Support ends at v2.x" means someone pinned dependency to e.g. "~2.0", so they don't support latest, but support something in 2.x).

I'll do a separate PR to share the script, which should ideally be run regularly so we can track the development over time, specially following new releases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @vidartf! Those kinds of metrics are important for administrators (and users) looking to decide when to upgrade.

Copy link
Member

@vidartf vidartf May 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blink1073 Thanks! Any suggestions on where I should put the code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My major concern with this point is that it will lead to a lot of dead/outdated extensions.

I share this concern. Just want to be clear that this PR tried to make a process our existing way of doing versioning. It changes a few branch names, and would let us release more often, but substantially shouldn't be change the story for extension authors.

I choose to do this, because I thought this would be easier to agree on then having a larger discussion around how to change our release process in general, although I think that would also be useful!

Talking with @kgryte about this, we did bat around some of the ideas from the Node release processes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also yeah, nice analysis! This is really helpful to have...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just feels like this would formalize something that in my impression is not something "we are OK" with.

Suggested change
**JavaScript versions In Sync** We are OK always keeping the JS version bumps in sync. Meaning that if we do a major release of the Python package we also do a major release of all JS versions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree ! I would love to not contain this part as well. I remember arguing strongly against it in previous meetings, because it forces extension authors to upgrade.

However, it makes it untenable, theoretically, if we want to keep doing backport versioning. Or if we have a practical hack, like bumping new minor version +10 on the next major version, it makes things definitely more confusing.


If in turn we could actually version each package independently, which I do think would be ideal for downstream extension authors.

From the PR standpoint, we could conceptually have some way of doing this, having three labels per package. notebook-extension-minor or something.

However, I when I started thinking through how we maintain backports for this kind of thing, it starts to get a bit... wild. Haven't totally thought this through, but I started coming to having a branch of basically every combination of packages and their next release. So maybe like:

notebook-extension-2.1.x-and-notebook-2.1.x-and...

IDK I haven't totally figured it out, as I said, but could explore it more to see if anything reasonable could come out of it, though I am doubtful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of good notes from our meeting here as well:

We would hope to see JupyterLab becoming more stable over time, so that means locking down a process, like is documented here, that forces extensions authors to release a new version every time we have a major release, is pretty harmful.

So we should figure out a way to not do major version bumps when we don't have to. I don't know how this will work at the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to actually solve this would be to move out of a monorepo, so that each package could have its own repo and be versioned independently. However, this is a non starter.


## Strategy

Based on these assumptions we can create a consistent branching strategy.

### Branches

Each branch should be aligned to different releases:

1. Patch: `{x}.{y}.x` branches.
2. Minor: `{x}.x.0` branches. After final release `{x}.{y}.0` create new patch `{x}.{y}.x` branch.
3. Major: `master` branch. After release `{x}.0.0` create new minor `{x}.x.0` branch and new patch `{x}.0.x` branch.

Each increment has a corresponding "latest" branch for the greatest working version. For example with two branches `2.x.0`, `1.x.0` the "latest" minor branch is `2.x.0`. The latest major branch is always `master`.

*Note: We could instead have separate branches for each version, but this isn't necessary if we are going by the **Single Active** assumption above.*


### PR SemVer Tags

Each PR should be tagged with one of three tags `semver:patch`, `semver:minor`, `semver:major` before it is merged. We could do this manually or by possibly giving a first guess [using type analysis](https://api-extractor.com/) or some commit message keywords. We should have a bot that blocks merging if one of these is not added.
saulshanabrook marked this conversation as resolved.
Show resolved Hide resolved

If you have change that is *only* a backport, and should not appear in master, then you can target that branch as the base of this merge.

### PR Backports
Once a PR is merged into master, backport PRs should be opened against all other open branches of the corresponding increment.
This will likely result in some cases where backports are proposed that are not appropriate, in which case they can be closed.

1. Patch: Backport to all `{x}.{y}.x` branches and all `{x}.x.0` branches.
2. Minor: Backport to all `{x}.x.0` branches.
3. Major: No backports.


### Milestones

Each branch therefore always has an "active" milestone associated with it, which corresponds to the next final release
on that branch. If we don't wish to make any more releases from a branch, we should delete it.

Each PR's milestone should reflect active milestone of the branch it targets.

### Changelog

Each PR should add or edit a file in a directory full of files corresponding to unreleased changes, [like matplotlib does](https://matplotlib.org/devel/contributing.html#contributing-pull-requests). During a final release, these items are deleted
saulshanabrook marked this conversation as resolved.
Show resolved Hide resolved
from that file and compiled to a changelog file for that release. All of these files are included in the master changelog.

If there are no new changelog entries, we should not be able to make a release.

The changelog entries should be kept in chronological order, instead of SemVer ordering.

## Example

I went back and looked at all our tags since `v1.1.0` to see if I could create a branch diagram for them (using [`gitgraph.js`](https://github.com/nicoespeon/gitgraph.js/)) if we had been using the technique I proposed above. I don't include any commits, besides releases and merges. Also, I designate still open branches by giving them each a final commit at the end. So this would be under the assumption that the only version branches open in the repo are `master`, `2.x.0`, `2.1.x`, and `1.2.x`. ([src](https://codepen.io/saulshanabrook/pen/xxwryBa?editors=1010)).

![](https://gist.githubusercontent.com/saulshanabrook/6d92df6e1872a5560674e097efd4abf3/raw/1e4f533196d24e9db9955dc8c0ba487e0633edc7/codepen----gitgraph-js-playground%2520(2).svg)

## Division of labor

There are many moving parts to the proposal. I thought it would be helpful to lay out what would be the responsibility of the different parties, including the bots.

The "Bots" section is also a roadmap of what would need to be implemented to move forward on this.

### Pull request author

**SemVer Label**: Make sure pull request has appropriate SemVer label.
You can set it by either adding a commit with `semver:patch`, `semever:minor`, or `semver:major`, or setting the label in GitHub (optionally [using a bot](https://github.com/jupyterlab/jupyterlab/blob/master/CONTRIBUTING.md#tag-issues-with-labels)).

**Changelog entry**: Add a new markdown or RST file in `docs/source/changes/new`. Use the current data and some rough title for the filename, so it is unique. If you want it to be under a sub-header, make a subdirectory for that. Some common ones are `Bug fixes`, `For developers`, `User-facing changes`. If your change is small, just make it a single bullet in a list.


**Milestone**: Don't touch this, it will be set automatically.

**Branch base**: Normally don't touch this. However, if this PR is not meant for the latest release, but instead only for a backport to an older release, set the branch base to correspond to the branch you want to merge it against. For example if you have a fix only for the next 1.2.x release, but the latest SemVer release is 2.1.2, then set the base branch to `1.2.x`.


### Pull request merger

**SemVer Label**: Verify the correct SemVer label is applied to the change.

**Changelog entry**: Verify that any changes add or change an existing changelog entry.

**Review backports**: After you merge it, review all the backport PRs that were created and merge them if appropriate. Close any that are not relevant. Also, for any backports the bot failed to make automatically, make them manually if they are necessary.

### Release Manager

Find the pull request to release the version you want and merge it. This will trigger the release of the packages. There should be different pull requests for each in progress branch for the different types of next release, like `alpha`, `beta`, `rc`, and `final`.

Once this release has been done, a list of open PRs will be posted to this PR, for any additional merges that need to happen in other repos. Also, you should delete the release branch if this was a final release and you don't want to do any more releases off of this branch.


### Extension Authors

Be sure to allow multiple major versions of a package, like `^2 || ^3` if they are both compatible. We should be clear about documenting this so extension authors
can release extensions that are compatible with multiple major versions of JupyterLab.


### Bots


**SemVer label**: Whenever a PR has new commits, scan them for the SemVer label names in the messages, and if it finds any merge it with the existing, choosing the more breaking. Also look at special label like `feature:Bug` (patch) and `feature:Enhancement` (minor).


**SemVer check**: Fail (and don't allow merging on this failure) on any PR that doesn't have one, and only one, of the `semver:patch`, `semver:minor`, `semver:major` labels.

**Create backports**: After a PR is merged into the `master` branch, create backports for that release. It should target all other branches of the same increment according to the "PR Backports" logic above.


**Base check**: Verify that the SemVer label corresponds to the increment of the branch of the base, if it isn't master.

**Update/create milestone**: Set the milestone to the next final release of the base branch. Update after every change of base branch and after every commit on the base branch. Create the milestone if it does not exist.

**Changelog Release Check**: On each release PR, fail if there are no new changelog entries in the branch.

**Create Release PR**: After a new release is successful or when a new version branch is created, create other branches, in a fork of this repo, for each possible new release type. Run the command to bump all the JS versions properly for that release type, without actually publishing those versions. Also bump the Python version, without releasing. Then open a PR to merge that branch into the original branch. We should also run the command to consolidate the documentation to a new release file.

**Test Release PR**: On each release PR, run a special test for verifying the release works. This works by running a Verdaccio server, publishing the NPM packages to that server, and then installing the Python package with that proxy server. Then a number of integration tests are run, like opening a notebook and running some cells. It should persist these built packages as artifacts on GitHub actions.

**Do release**: After a release PR is merged, pull in the artifacts from the release PR and publish them to NPM and PyPi. Once that is successful it will push back tags for that release to the GitHub merge commit.


**After final release, create new branches**: Once new tags have been pushed for a final release, create new branches based on the "Branches" logic above.

**After latest final release, update other repos**: After a final release that is the latest release, by SemVer, open a PR to each cookiecutter repos, and other extension repos, to upgrade them. Post a link to these on the release PR.

**After release, create conda packages**: After any release, create a PR against the conda forge repo to release that version. It will create it on the same branch as the branch we made the release on in JupyterLab, on some fork of that repo. Post a link to this on the release PR.


## Open questions

1. How do we deal with changelogs properly? Where do we deploy them from?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there not prior art from matplotlib on this as well?

Copy link
Member Author

@saulshanabrook saulshanabrook May 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should ask them... Thinking about how the changelogs works is one of the most confusing parts for me.

I'll spell out how I am thinking about it here, and hopefully I am missing something that will make it simpler.

Let's say I have two branches 1.2.x and 1.x.0. I merge a patch into master, and this is backported and merged into those two branches.

Now, if I do new patch release next on the 1.2.x, I assume that changelog entry should show up in that patch release, but not in the next minor release.

However, if I do the next minor release first, I should show the changelog entry for the patch PR in that release! And then if I do a patch release later, it should also show it in this as well, I believe?


Another changelog question, for each RC or Alpha release, do we make a seperate changelog entry? For the final release, do we show what has just changed since the last pre release, or since the last final release?


If we tied changelog entries instead to commit messages or to PR description, instead of to files, we could have more flexibility here in building dynamic changelogs for different scenarios. I could even imagine on the most flexible side, an option where you select two release and it shows you the changelog entries between them.

But this would negatively impact our ability to curate changelogs and merge together multiple PRs into one entry.

@jasongrout you do a lot of changelog work! Maybe you also have opinions here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a good conversation about this with @telamonian and @vidartf at the weekly meeting:

  • When users go to read the docs and click on changelog, they should see all changes, regardless of what branch they were made on.
  • doing a "forward port" pr of the added changelog entries on backport releases to the recent releases isn't a crazy way to do this.

I will work on updating this document with these principles in mind. Thank you all for the continued feedback.

Copy link
Member Author

@saulshanabrook saulshanabrook Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about these changelog issues more, I reluctantly am coming to terms with the fact that we might benefit from a richer semantic model of changelog entries. Ignoring the "Which branch are these in??" question at first, I have started thinking about the changelog as a number of entries, each with:

  • PR(s): At least one, but maybe multiple PRs which this text summarizes
  • Description: Either a single sentence or a richer multiple node (images, paragraphs) description of the feature
  • Category (optional): A heading to put this under, like "Developer Changes" or "User Changes" or "Backwards Incompatible extension changes"

I believe this would cover the breadth of our existing changelog, although I would love @jasongrout's input on this since he has been running point on the more in depth ones lately.

Pairing this information with knowledge of what release(s) a certain PR was first present in (maybe multiple b/c of backports), we can create a diff of entries given some number of source releases and some target release. For example, we could ask "What are all the changes added to 3.0.0 starting from 2.0.0 and 2.2.2?" This would mostly be additions, but there conceptually be some removals, say for entries that were just targeted to some 2.x release that were not included the 3.0.0 release.

Given that, we could create some text description, in Markdown or RST. We could be specific about what the release we are comparing against for each release.

Now for our "Changelog" file in the docs, we would basically run that function a number of times and collect the various responses together.

One open question I have is "What is the base release(s) for some given changelog entry?" For example, for the changelog for 3.0.0, we could say diff from the last prelease of 3.0.0. However, what about for the first pre-release of 3.0.0? From a git perspective, the last release on that branch (the x.0.0/master branch) would have been 2.0.0. However, showing all changes since 2.0.0 is not what we are doing currently. What we are doing currently is probably all changes from the last release of 2.x before the first pre-release?

So some possible version of rules here (these likely need some massaging):

  1. Patch versions {x}.{y}.x branches: For final and pre release patch versions, base the changes off of the previous pre-release/patch. So for say 3.2.1 base it off of the previous release in the 3.2.x branch.
  2. Minor Versions {x}.x.0 branches: Base off of last release on this branch. Also, base on last patch release of previous minor release if that is more recent that latest release on this branch. i.e. a 3.2.0 release should include changes since 3.1.4 not since 3.1.0, if 3.1.4 was released before 3.2.0.
  3. Major versions x.0.0 branch: Base off of last release on this branch + last release minor release of previous major release + last patch release of last minor release of previous major release, if any of those are released after last release on this branch.


## Further work:

**Changelog bot**: Bot that helps you auto generate changelog entries from git commits or PR descriptions. Comment `bot:generate-changelog` in the PR to create one from the GitHub issue contents. It will replace/create a changelog file using the title of the PR and the issue number.