Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for Pulsar Backend versioning system #64

Open
8 of 10 tasks
Digitalone1 opened this issue Jan 20, 2023 · 7 comments
Open
8 of 10 tasks

Proposal for Pulsar Backend versioning system #64

Digitalone1 opened this issue Jan 20, 2023 · 7 comments

Comments

@Digitalone1
Copy link

Digitalone1 commented Jan 20, 2023

Since lately there have been lots of discussions on how to resolve our issues regarding the management of the versions of packages, I make a proposal which I think is the best compromise about

  • being permissive on publishers
  • having a quite easier management of versions
  • allowing to implement additional release channels in the future.

A part of the following post is meant to be copied and pasted in a document as guidelines for publishers.

TLDR: Almost the same system as now plus the created timestamp and a few changes which require a refactoring on database queries.

Main guidelines

  • Pulsar package system requires the versions of the published packages to adhere to MAJOR.MINOR.PATCH format (which from now on we call "main tag") of the semver specification.
  • Pulsar backend performs a control ensuring MAJOR.MINOR.PATCH format is respected prior to the publication, therefore any version not following the above point is considered bad and it's rejected.
  • Any version can optionally have extension labels in the format MAJOR.MINOR.PATCH-extension to support alpha/beta/pre-release alternatives.
  • Pulsar backend is not enforcing publishers to respect the rules of the semver specification regarding the extension labels and no controls are performed on them because it's too hard for the dev team to follow those rules. The publisher is responsible to release a version with the correct extension, if present.

Publishing guidelines

  • A publisher can use the existing API to publish versions in the default stable release channel.
  • In the future additional endpoints will be deployed to publish also in optional supplementary channels (alpha/beta/pre-release). Guidelines for this will be explained by the dev team when everything is ready.
  • Considered all the versions of a single package, their tags MUST be UNIQUE. This means all versions of a package have different tags and if a publisher deletes a version, they can't publish a new one with the same deleted tag.
  • Considered the same release channel, two or more versions can have the same main tag MAJOR.MINOR.PATCH as long as they differs with the extensions. I.e., publishing 1.0.0, 1.0.0-1 and 1.0.0-2 is possible.

Determining the latest release

  • Pulsar backend has its own way to determine the latest release of a specific channel which we recommend the publisher to understand in order to avoid misunderstandings.
  • Taken two versions of the same channel of the same package, the highest is the one that comes first in the natural descending order of the main tag. I.e., 0.10.0 is higher than 0.9.0.
  • The dev team strongly discourage the publication of two or more versions with the same main tag. But that is not forbidden. Anyway we recommend to read the following point.
  • Taken two versions of the same channel of the same package with the same main tags, the highest is the one that has been published more recently on the backend. I.e., 0.1.2-1 published today and 0.1.2 published tomorrow means that 0.1.2 will be considered higher than 0.1.2-1. Publishers have to know that and, taken into account that the publication of multiple versions with the same main tag is still discouraged, the dev team won't accept any complaining on this behavior.

How to migrate to the new versioning system

versions table refactoring

  • Unique constraint in versions table should be rolled back to (package,semver) rather than (package,semver_vx, ...). This is because we will support different extensions for the same main tag, so we need to ensure all tags are different for a single package.
  • Package handlers functions that are considering semver array rather semver plain string should be rolled back too.
  • Generated columns semver_vx are kept as they are since we need them to sort the main tags.
  • Implement created and updated columns for versions the same way they were added to the packages table.
  • While sorting the versions in descending mode, the created timestamp will be considered so that two versions with the same main tag can be sorted and the latest released is the highest of them.

Drop versions.status column

  • Do not take into account the status column of the versions table. This column will be dropped along with its enum type. The reason is quite simple: we sort the versions on the database, so we don't need to signal the latest. It's also misleading when channels will be deployed because there will by multiple latest, one for each channel. published and deteled status are redundant and will be replaced by the following point. Optionally a new status column could be used in the future to signal the state of the version (no issues, bugged, unsecure, etc...)
  • Implement a new bool column deleted in versions table. Its meaning is self explanatory: true if the version is deleted. Initialize it setting to true if status is deleted; false is its default.
  • Rewrite all queries that uses the versions.status column so that it can be definitely dropped. We know how to sort and get the latest version properly.

Drop redundant package metadata

  • Drop JSONB data in packages table. At first sight this might be a disruptive change, but it's not. packages.data is redundant data because it's just the package of the latest version, which we already have in versions table. Besides with the introduction of the channels, it's misleading having the package of the latest versions in packages table (which channel does it belong?). If we want it, we can join the versions table and we can do it for any channel.

Manage the duplicated versions for already published packages

  • We have resolved the issue of duplicated versions looking at the created timestamp, there's still a problem for the already published versions. That's because when we deploy the created column in the versions table, it is initialized with the timestamp of the creation of the column, so it won't contain the effective time of the publication of the version. But we can workaround this issue adding an interval to the versions that are marked as latest before dropping the status column, so those versions keep resulting as the latest in case there are other duplicate main tags.

If you have some doubts or thoughts, feel free to point them out.

@confused-Techie
Copy link
Member

@Digitalone1 This is beautiful!

I sincerely appreciate the work and thought that went into this. And for this solution I am 100% on board.

This looks fantastic, and while keeping things sane, allows us the most flexibility now and in the future.

Everything here looks awesome, and would be happy to help out wherever possible to get this working properly.


As for your last point about already published versions. Once we add the tag to the DB, I'd be more than happy to investigate manually into the published versions to find their actual publication date based on GitHub Tag creation date. I believe it was about 60 items that need this care, so ideally wouldn't take to much time.

@Digitalone1
Copy link
Author

Digitalone1 commented Jan 21, 2023

List of changes to make on the database. @confused-Techie mark this steps when you port the new db state on production.

  • Add new created and updated column to versions
ALTER TABLE versions ADD COLUMN created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP;
ALTER TABLE versions ADD COLUMN updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP;
  • Add update timestamp trigger for versions
CREATE TRIGGER trigger_now_on_updated_versions
    BEFORE UPDATE ON versions
    FOR EACH ROW
EXECUTE PROCEDURE now_on_updated_package();
  • Add deleted column to versions
ALTER TABLE versions ADD COLUMN deleted BOOLEAN NOT NULL DEFAULT FALSE;

@Digitalone1
Copy link
Author

As for your last point about already published versions. Once we add the tag to the DB, I'd be more than happy to investigate manually into the published versions to find their actual publication date based on GitHub Tag creation date. I believe it was about 60 items that need this care, so ideally wouldn't take to much time.

It would be awesome to have the real publication date of every version, also those imported by atom, but I don't think this is possible.

Anyway I'd like to add an index on semver_vx generated columns and created timestamps. This is not mandatory, but can speed up the sorting stages. But at the moment this won't work because equal semver_vx triples will have the same created timestamps (default at column creation).

If we want this, you will have to change manually the timestamps (even adding 1 only ms is enough). This would also help to retrieve the correct latest version for ones that have the same main tag.

If there are too many, you can deleted the duplicated non-latest and sort only the latests.

@confused-Techie
Copy link
Member

I can certainly try to update the duplicated non-latest versions.

If it does become too much I'll look at either scripting it or (as long as you don't delete the branch) your previous script to do so.

But first I'll add the DB changes tomorrow so we can begin that step.

And ideally we could script up a one time task that will look through each package, contact the GitHub API and insert proper creation dates into everything. But that can always come later.

@Digitalone1
Copy link
Author

#65 and #66 covered all steps except the last two which I'd like to manage when the changes applied to the database are pretty stable. So for now let's see how this changes behave on the database (I hope good, at least from the tests I don't see nothing wrong).

@Digitalone1
Copy link
Author

Digitalone1 commented Feb 2, 2023

So, after the merge of #62, #65 and #66, an issue came out.

The issue

  • The response package sent out by the backend does not respect the format of the package short/full.

The problem

  • It's not strictly related to how the merged changes were designed, but by the fact that when we imported the atom database, the version.metadata contained only the packageJSON, while the packages.data contains also the repository object and the readme.
  • the latest changes are stuffing repository object and readme into versions.meta, but for already stored records, those data are missing, so they have to be imported from packages.data into versions.meta in order to deprecate packages.data.

New versions table design

After a talk on Discord, we agreed to add new columns to make it easy to construct the response objects sent out by the backend:

  • a new readme TEXT column containing the readme of the related version.
  • a new repo_type column of type provider ENUM, which at the moment will contain only git, then we will add others when supported.
  • a new repo_url TEXT column containing the url of the repository.

Besides, the meta JSONB column will contain only the packageJSON, i.e. the object retrieved by vcs provider.packageJSON with tarball info appended to it.

@Digitalone1
Copy link
Author

Digitalone1 commented Feb 2, 2023

Apply new columns to versions table

This had to be done after #74 goes into #50, and #50 goes into main branch.

  • add readme
ALTER TABLE versions ADD COLUMN readme TEXT NOT NULL DEFAULT '';
  • add repo_url
ALTER TABLE versions ADD COLUMN repo_url TEXT NOT NULL DEFAULT '';
  • add repository enum
CREATE TYPE repository AS ENUM('git');
  • add repo_type
ALTER TABLE versions ADD COLUMN repo_type repository NOT NULL DEFAULT 'git';

Tested at https://dbfiddle.uk/CVZk9pwv and https://dbfiddle.uk/QvQNEXEn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants