Build version (build tag?) in RPM #2031

bookwar · 2022-04-21T14:11:11Z

bookwar
Apr 21, 2022

Hi,

I'd like to start a conversation about build-time versioning in RPM

Problem

There are several use cases, like mass-rebuilds, soname bumps, Packit, Fedora ELN, CentOS Remixes, RHEL rebuilds.. where we need to rebuild RPMs without changing their content, neither sources nor spec files. The only change in such case is a change of a buildroot environment which changes the outcome of the build process.

Currently to address that we use some workarounds:

in case of mass-rebuilds or soname bumps in Fedora we create an "empty" commit to the dist-git repository which artificially bumps a release number in the spec file smth-1.2.3-5.fc36 -> smth-1.2.3-6.fc36,
in case of Fedora ELN we bump the disttag macro for the entire buildroot (smth-1.2.3-5.eln101 -> smth-1.2.3-5.eln102), which allows us exactly one rebuild of any package,
in case of Packit the release field is filled using the current date

The issues with these workarounds are:

Noise in the changelog. We create empty events in git while nothing changes.
Unnecessary forks. Even when we want to reuse the upstream sources as is, for example to build a Remix we have to maintain the diverging git tree to implement the rebuild process
Permissions: rebuild of a package for the soname bump of a dependency requires full edit access to its sources.
Mix of autogenerated and manually written data in the same field (Release)
No support in libraries and high-level tooling.

Strictly speaking the problem comes not from the RPM format itself, but from a build system which enforces the policy that every build must have a unique identifier, and therefore not allowing two builds with the same filename(Name-Version-Release combination).

But since this is a reasonable policy for any build system, we get to the question: can we give an rpm package a unique identifier, which would not only depend on the Version of the sources and Release field of the spec file, but which would also uniquely identify the build environment used to build it?

And while we could resolve it on the level of each individual build infrastructure, it would be much better to have a standard interface shared between different build infra implementations and supported by the format specification.

Possibilities

While we were brainstorming the options the most obvious solution which we came up with was to introduce a numerical build id - a monotonically increasing build number, and then use a macro to inject it in the Release field in the build time. (Similar to Packit approach above)

Basically we set Release: 5%{disttag}.%{build-id} in a spec file, and on rebuild we get different build-ids in the filename:
smth-1.2.3-5.el9.15172316 or smth-1.2.3-5.el9.15172321 and so on.

In case of Koji that build id would come from the Koji build id. But other build systems could define it differently.

But this kind of solution feels incomplete without proper metadata support in the RPM.

Without clear specification to compare rpms people would have to parse the Release field of an rpm with fragile regexps and custom scripts on the client side.

Questions

Are there any plans to introduce build environment versioning in the RPM?
Maybe there are already metadata fields which can be used for this purpose and we can expose them in the filename?
Is it generally doable and what do you think about the whole idea?

CC @sgallagher

pmatilai · 2022-04-22T10:01:55Z

pmatilai
Apr 22, 2022
Maintainer

A bunch of random initial thoughts on the subject:

BuildID is already overloaded in rpm, any such thing would need a different name. Appending more and more stuff into Release is ugly and problematic for other reasons, but of course adding any new fields to rpm version comparison is a long, painful road.

Just for the purposes of giving build artifacts different filenames you don't need any code changes, just define %_build_name_fmt macro as something like %%{ARCH}/%%{NAME}-%%{VERSION}-%%{RELEASE}.%%{BUILDTIME}.%%{ARCH}.rpm. This wont allow upgrading between such packages, but that is a whole other can of worms, and buildtime is hardly a solution for that either.

The only monotonically increasing value available to rpm is the clock though (which is not as monotonic as time itself, unfortunately), anything else would somehow require storing a previous value somewhere "central" and rpm has no such place available. So it goes back to some outside thing setting a macro that rpm slaps into a tag and forgets. Adding tags is cheap, whether it actually solves an issue is a different question.

And then proper uniqueness for eg build environment requires hashes (eg hash of all installed packages, macros and env variables during build), which in turn are not version comparable. So there seem to be multiple, conflicting goals here, and reproducible builds are another, pre-existing and conflicting goal.

1 reply

DemiMarie Apr 22, 2022

Is something like Nix the answer?

cgwalters · 2022-04-23T12:14:22Z

cgwalters
Apr 23, 2022

numerical build id - a monotonically increasing build number,

That already exists - it's the build time!

A mode where RPM completely ignores the EVR and simply compares build times would help unlock a whole bunch of things, including making it easier to revert to a previous version.

and buildtime is hardly a solution for that either.

Why do you say that?

That's how ostree works for example today - the version number is just for humans, and the code instead just cares about (by default) a monotonically increasing timestamp: https://github.com/ostreedev/ostree/blob/98587a72db9b52eee63b4bfa9c47a77d2e327501/src/libostree/ostree-repo-pull.c#L1668

0 replies

simo5 · 2022-04-25T12:18:09Z

simo5
Apr 25, 2022

On Sat, 2022-04-23 at 05:14 -0700, Colin Walters wrote: > numerical build id - a monotonically increasing build number, That already exists - it's the build time! A mode where RPM completely ignores the EVR and simply compares build times would help unlock a whole bunch of things, including making it easier to [revert to a previous version](https://lwn.net/Articles/513346/).

Time could be used, but we need a the same build Id for all the rpms generated by the build for all arches. So we can't just use the build time of the individual RPM.

> and buildtime is hardly a solution for that either. Why do you say that? That's how ostree works for example today - the version number is just for humans, and the code instead just cares about (by default) a monotonically increasing timestamp: https://github.com/ostreedev/ostree/blob/98587a72db9b52eee63b4bfa9c47a77d2e327501/src/libostree/ostree-repo-pull.c#L1668

This could cause issues if two builders have a time offset or package building races and any number of other things. Time *can* be used, when set in stone at SRPM creation time perhaps, but it is not the only or best option, especially for build systems that do not forcibly re-create the SRPM. Simo.

…

-- Simo Sorce RHEL Crypto Team Red Hat, Inc

2 replies

cgwalters Apr 25, 2022

Time could be used, but we need a the same build Id for all the rpms
generated by the build for all arches.

Sure, but that's basically having koji capture "time of build creation" (which already exists actually of course - the toplevel koji task has a "Start time") and propagate that via the equivalent of SOURCE_DATE_EPOCH to the individual builds.

bookwar Apr 26, 2022
Author

I'd say when you apply the logic like this it stops being the Build Time and it becomes "some counter which is provided by the buildsystem". Timestamp can be used as input to set that counter, but how exactly it happens is an implementation detail of that specific buildsystem. And real build time is a thing with strict definition, which build system must not override.

bookwar · 2022-04-25T12:33:21Z

bookwar
Apr 25, 2022
Author

Generally speaking what kind of entities we have in play?

build itself as an an outcome of a build action in the build system. It should be uniquely identified within the scope of that build system no matter the content, version, reproducibility whatsoever.
build environment - the set of configuration options, server settings, dependencies, buildroot repos, mock config,.. everything outside the dist-git sources which has impact on the outcome.
build version - some field exposed in a way that it is involved in the calculation of the upgrade path between two rpms.

From the point of a build system each build needs to have a unique key. The meaning of that key is not important, it is like a primary key field in the database, which simply needs to be unique and allows to store the rest of the data.

For example in Jenkins the unique key for a build is a BUILD_TAG = string of jenkins-${JOB_NAME}-${BUILD_NUMBER}.
In case of RHEL the analog would be something like brew-${brew-buildtarget}-${brew-build id}

And I agree with Simo that while build time can be used as an implementation of some part of a build tag, it is not necessarily the best one, and it very much depends on the way how the build system operates.

From the point of reproducible builds it is more interesting to track meaningful build information, thus one needs to encode enough data for the comparison of two builds taken in isolation from their original build systems.

While reproducible build is an interesting topic on its own I think it is not exactly related to the set of problems I listed in the top post.

Now I would maybe reformulate my original question in a way: can we provide some generic approach how build system can set the build version at build time, so that it will be used in the calculation of the upgrade path. And what form that build version should have.

And I would leave it to the build system to decide what this version should be within the restrictions of its form, same way how we leave it to maintainer to decide what Release version of a package should be. RPM format itself provides a place for this version, and some constraints, but it doesn't set it.

For example if we agree on the form as number, then some build systems may use build time, some - build id and some will set a custom number which user enters via GUI. It should all be allowed as long as the format and upgrade path rules are set on the rpm side.

0 replies

xsuchy · 2022-04-26T10:56:16Z

xsuchy
Apr 26, 2022

Random thoughts from me:

+1 to the idea in general.
Forget about Koji, Copr. Or OBS. Solve the task on a local level first. I.e. in plain Mock on the command line.
The filename is already long. And 3rd party users (look into Stackoverflow questions) already use foo.rpm removing the VRA from the filename. Not even mention the dist tag.
On the other hand, make sure that the outcome does not go against idea of reproducible builds.
The number should be self-explanatory. I.e. when we use disttag fc36 people know what it is. Compared to ID number from the row from PDC.
Even for automatic rebuilds you may want to know the reason: Rebuild because of new Python. Mass rebuild of all packages (just because). Rebuild because of a new definition of macro X...

3 replies

bookwar Apr 26, 2022
Author

Forget about Koji, Copr. Or OBS. Solve the task on a local level first. I.e. in plain Mock on the command line.

I agree that Mock is a reference implementation for this. But I'd say we need to have build system in mind here. We can not forget about build system, we just need to be build-system-agnostic.

The filename is already long. And 3rd party users (look into Stackoverflow questions) already use foo.rpm removing the VRA from the filename. Not even mention the dist tag.

Do I understand correctly that rpm, dnf, mock and other tooling operate on rpm metadata and don't use filename for anything? So filename only matters for the external user, like a sysadmin who tries to do something with the list of filenames without proper metadata access?

On the other hand, make sure that the outcome does not go against idea of reproducible builds.

I think reproducible builds is all about comparison rules, but to start comparing things you need to build and store them under unique names anyway.

The number should be self-explanatory. I.e. when we use disttag fc36 people know what it is. Compared to ID number from the row from PDC.

I think this topic is in scope of a build system, not RPM metadata. Things like Koji Build Id or "timestamp of the src.rpm build" are self explanatory for that system, but RPM knows nothing about them.

Even for automatic rebuilds you may want to know the reason: Rebuild because of new Python. Mass rebuild of all packages (just because). Rebuild because of a new definition of macro X...

This is interesting topic indeed. We want to build things without changing the dist-git sources, but it doesn't mean we shouldn't track the reasons for the rebuild. Should it be tracked on a build system side or we need a "binary changelog" next to the "spec changelog" in the rpm itself?

pmatilai Apr 27, 2022
Maintainer

Even for automatic rebuilds you may want to know the reason: Rebuild because of new Python. Mass rebuild of all packages (just because). Rebuild because of a new definition of macro X...

Indeed. Such a message is essentially property of that build, so we could add cli switches to communicate the "build message", which just gets tacked on top of the binary changelog. Such messages would of course disappear in future rebuilds, but from the package's POV they are quite irrelevant anyhow.

bookwar Apr 28, 2022
Author

I like the idea. I think that even without BUILD tag, one can add default changelog message like Built on %{BUILDHOST} at %{BUILDTIME} An then override it with additional comment, if it is passed from the cli.

We would need to investigate though if we have some comparison tools which rely on changelog when calculating the upgrade info. Things like Satellite, LEAPP, Insights.. if they compare changelogs of two builds and expect older changelog to be included in the new one, then disappearing line of a build changelog could become an issue.

xsuchy · 2022-04-26T12:43:43Z

xsuchy
Apr 26, 2022

Do I understand correctly that rpm, dnf, mock and other tooling operate on rpm metadata and don't use filename for anything?

Right

So filename only matters for the external user, like a sysadmin who tries to do something with the list of filenames without proper metadata access?

Right. And if you overload them, there is a risk that they will ignore the whole thing.

Should it be tracked on a build system side or we need a "binary changelog" next to the "spec changelog" in the rpm itself?

Personally, I would prefer dual changelog. But I have no idea how to implement it :)

0 replies

bookwar · 2022-04-26T14:36:04Z

bookwar
Apr 26, 2022
Author

So thinking more about it and adding a compatibility consideration (we want build-versioned rpms to work alongside with the non versioned builds and we want to provide a possibility for iterative development for both rpm and build system developers) I would come with the following proposal:

introduce build version (BUILDVERSION, BUILDTAG ?) as an informative-only tag (https://rpm-software-management.github.io/rpm/manual/tags.html)
Inject build version into Release tag (on the build system side), thus it will be used in upgrade path calculations even without support on the RPM side
Implement support for the upgrade path on the rpm side in compatible way so that NV(R=R+B) and NVRB are ordered the same way
Stop injecting build version in the Release tag.

Before	Transition	After
Name: smth Version: 1.2.5 Release: 5.el9	Name: smth Version: 1.2.5 Release: 5.el9.1517234 Buildtag: 1517234	Name: smth Version: 1.2.5 Release: 5.el9 Buildtag: 1517234

This way we can let build-system and rpm to be developed and updated in a non-blocking way: the step 3 can take as much time as it needs while buildsystem will already implement the BUILDTAG and then can be switched to the new functionality when it is ready.

Basically it encodes the workarounds we already do in Packit or in Fedora CI for scratch builds, but it adds the path how to get to the point where it stops being a workaround and becomes a proper setup.

9 replies

jwboyer May 17, 2022

It seems like we're heading towards a solution that makes it actively harder for users to understand what is going on by hiding the Build tag but using it during transactions. If you're going to include this in NVR comparisons but hide it from display, you might as well overload Epoch for this purpose. You don't have to add anything to RPM to do that.

simo5 May 17, 2022

I am opposed to invisible build id and epoch is not replacement at all.

jwboyer May 17, 2022

Oh good, Epoch still evokes strong feelings ;)

My suggestion on Epoch was simply to illicit response but also to highlight that hiding things seems like an anti-pattern. I don't really want to see Epoch overloaded, but I also don't want to see Build fall into that same trap.

bookwar Jun 3, 2022
Author

@jwboyer Sorry, I missed your comment.

t seems like we're heading towards a solution that makes it actively harder for users to understand what is going on by hiding the Build tag but using it during transactions.

I think there is a misunderstanding: I do not propose to hide the build id in the end.

Let's say in the transition stage we have

Name: smth
Version: 1.2.5
Release: 5.el9.{build}
Build: 1517234

And filename = {name}-{version}-{release}.rpm

Then in "After" stage we would set:

Name: smth
Version: 1.2.5
Release: 5.el9
Build: 1517234

Filename = {name}-{version}-{release}.{build}.rpm

So the outcome - visibility for the user - would be exactly the same. We would just treat Build tag differently from the tooling perspective, and advanced tools and libraries will be able to use Build tag instead of parsing release tag and guessing the value from it.

bookwar Jun 3, 2022
Author

Updated version:

Before	Transition	After
Name: smth Version: 1.2.5 Release: 5.el9	Name: smth Version: 1.2.5 Release: 5.el9.1517234 Buildtag: 1517234	Name: smth Version: 1.2.5 Release: 5.el9 Buildtag: 1517234
Filename: {name}-{version}-{release}	Filename: {name}-{version}-{release}	Filename: {name}-{version}-{release}.{build}

jwboyer · 2022-06-03T20:46:22Z

jwboyer
Jun 3, 2022

Is there a reason we couldn't generate the Buildtag value and put it in the file name via a macro until RPM grows support for the tag? Then there's no transition other than moving away from the macro. The concept becomes visible to the users ASAP and you get feedback on it before it's encoded in the RPM format "forever".

1 reply

bookwar Jun 5, 2022
Author

I might be wrong, but I think Filename doesn't participate in the rpm version comparison.

If we only adjust a filename, RPM won't be able to calculate the upgrade path between smth-1.2.3-5.el9.123.rpm and smth-1.2.3-5.el9.315.rpm because they are going to have the same NVR.

jwboyer · 2022-06-05T11:05:28Z

jwboyer
Jun 5, 2022

Apologies, I was unclear. I mean use a macro and temporarily put it in Release so that it shows up in the filename.

0 replies

sgallagher · 2022-06-06T17:37:59Z

sgallagher
Jun 6, 2022

That is in fact exactly the short-term solution that we have proposed. So I'm glad we're aligned here. 😄

0 replies

bookwar · 2022-06-07T06:42:47Z

bookwar
Jun 7, 2022
Author

@sgallagher @jwboyer It is possible to do a local version of this on build-system level without any changes to RPM. But I would prefer if we get the initial setup confirmed, which would for me include the definition of Build tag and the "build reason".

This will give us some confirmation that temporary local solution would indeed be temporary, and that it is aligned to where upstream is going.

0 replies

dralley · 2022-06-07T15:43:57Z

dralley
Jun 7, 2022

can we give an rpm package a unique identifier, which would not only depend on the Version of the sources and Release field of the spec file, but which would also uniquely identify the build environment used to build it?

How do you define unique identifier?

It is commonplace to change RPMs after build-time via signing. Just because two packages have the same NEVRA and build time does not mean they are the same package. If you want something truly unique, you probably want a checksum.

0 replies

lnussel · 2022-06-13T12:23:31Z

lnussel
Jun 13, 2022

FWIW as automatic rebuilds are a natural part of openSUSE maybe there's some inspiration from how it's done there. Basically the build system maintains it's own release value by tracking the so called checkin counter and rebuild counters. OBS' build system edits the spec file it passes to rpmbuild to set the release string.
IIRC (@mlschroe correct me if it's wrong) whenever the version in the spec file of a package changes in the source repo, the checkin count is increased, starting at 1 and the rebuild counter reset to 1. The value for the rpm is a concatenation of both counters with a dot, ie <CI_CNT>.<B_CNT>. Each automated rebuild increases the rebuild counter.

So the first build of a package foo version 42 would have NVR foo-42-1.1. An auto rebuild without source change foo-42-1.2, then foo-42-1.3 and so on. Version 43 would start with foo-43-1.1 again. If you fix eg a typo in the spec file of version 43 then, it would get foo-43-2.1

That makes life for packagers really easy as one has to basically never care about the release value. A spec file can easily be copied around and reused in different projects. The binaries end up having independent release values.
In that system obviously one cannot make assumptions about the release value of other packages when specifying dependencies.

4 replies

bookwar Jun 17, 2022
Author

Do I understand correctly that both check-in counter and build-counter are tracked by the build system? So there is no way to recover that information for a local build without access to the centralized database?

If I am rebuilding foo-42-1.3.rpm in mock, do I get foo-42-1.1 package in the local environment?

sgallagher Jun 17, 2022

If my understanding is correct, in the local mock environment, you'd just get foo-42-0 for the NVR.

From @mcatanzaro on [email protected]: "[on SUSE] the Release is always set to 0 in the specfile".

lnussel Jun 20, 2022

Do I understand correctly that both check-in counter and build-counter are tracked by the build system? So there is no way to recover that information for a local build without access to the centralized database?

If I am rebuilding foo-42-1.3.rpm in mock, do I get foo-42-1.1 package in the local environment?

If the local build tool doesn't track the release itself you'd get whatever is in the spec file. So if you use a checkout from OBS it's most likely 0 or 1, as that's what packagers usually put. If you rebuild the src.rpm of your example you'd get 1.3.

Conan-Kudo Sep 4, 2024

I think another part missing from this thread is that the build system "burns in" the value of Release into the spec file before running rpmbuild, so it's part of the shipped artifact.

simo5 · 2022-10-11T08:52:57Z

simo5
Oct 11, 2022

On Tue, 2022-04-26 at 05:24 -0700, Aleksandra Fedorova wrote: > Forget about Koji, Copr. Or OBS. Solve the task on a local level > first. I.e. in plain Mock on the command line. I agree that Mock is a reference implementation for this. But I'd say we need to have build system in mind here. We can not forget about build system, we just need to be build-system-agnostic. > The filename is already long. And 3rd party users (look into > Stackoverflow questions) already use foo.rpm removing the VRA from > the filename. Not even mention the dist tag. Do I understand correctly that rpm, dnf, mock and other tooling operate on rpm metadata and don't use filename for anything? So filename only matters for the external user, like a sysadmin who tries to do something with the list of filenames without proper metadata access? > On the other hand, make sure that the outcome does not go against > idea of reproducible builds. I think reproducible builds is all about comparison rules, but to start comparing things you need to build and store them under unique names anyway. > The number should be self-explanatory. I.e. when we use disttag > fc36 people know what it is. Compared to ID number from the row > from PDC. I think this topic is in scope of a build system, not RPM metadata. Things like Koji Build Id or "timestamp of the src.rpm build" are self explanatory for that system, but RPM knows nothing about them. > Even for automatic rebuilds you may want to know the reason: > Rebuild because of new Python. Mass rebuild of all packages (just > because). Rebuild because of a new definition of macro X... This is interesting topic indeed. We want to build things without changing the dist-git sources, but it doesn't mean we shouldn't track the reasons for the rebuild. Should it be tracked on a build system side or we need a "binary changelog" next to the "spec changelog" in the rpm itself?

I think this is going too far and should be explicitly be considered out of scope. For many build systems the reason why you are doing a rebuild is fully confined within the builds production, there is no need to note it at the rpm level. For those cases where noting it in the RPM is needed, people should explicitly go and bump the NVR and write a changelog entry. Simo.

…

-- Simo Sorce RHEL Crypto Team Red Hat, Inc

0 replies

Build version (build tag?) in RPM #2031

Problem

Possibilities

Questions

Replies: 14 comments · 20 replies

pmatilai Apr 22, 2022 Maintainer

bookwar Apr 26, 2022 Author

bookwar Apr 25, 2022 Author

bookwar Apr 26, 2022 Author

pmatilai Apr 27, 2022 Maintainer

bookwar Apr 28, 2022 Author

bookwar Apr 26, 2022 Author

bookwar Jun 3, 2022 Author

bookwar Jun 3, 2022 Author

bookwar Jun 5, 2022 Author

bookwar Jun 7, 2022 Author

bookwar Jun 17, 2022 Author

Replies: 14 comments 20 replies

pmatilai
Apr 22, 2022
Maintainer

bookwar Apr 26, 2022
Author

bookwar
Apr 25, 2022
Author

bookwar Apr 26, 2022
Author

pmatilai Apr 27, 2022
Maintainer

bookwar Apr 28, 2022
Author

bookwar
Apr 26, 2022
Author

bookwar Jun 3, 2022
Author

bookwar Jun 3, 2022
Author

bookwar Jun 5, 2022
Author

bookwar
Jun 7, 2022
Author

bookwar Jun 17, 2022
Author