Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thin repositories #798

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Thin repositories #798

wants to merge 2 commits into from

Conversation

bapt
Copy link
Member

@bapt bapt commented Nov 4, 2020

it does not depends actuallt on #797 but I am too lazy to split that in to so will go in after #797 :D

@@ -2595,7 +2595,12 @@ jail_start() {

# May already be set for pkgclean
: ${PACKAGES:=${POUDRIERE_DATA}/packages/${MASTERNAME}}
: ${THIN_PACKAGES:=${POUDRIERE_DATA}/packages/${MASTERNAME}-thin}
if [ ${SMALL_REPO} -eq 1 ]; then
THIN_EXT=-small
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confusing that the options are minimal and medium, but the extensions are -thin and -small. At a minimum the documentation should mention this. Ideally things should lineup throughout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but minimal and medium are bad names, think and small are nice, but we can't use proper switches for them, I am open to any suggestion for renaming that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should spell out what these are rather than "thin" or "small". Too bad -f/-F is taken as -f[ilter] makes sense to me.

-m all # default
-m listed
-m recursive / runtime # This only excludes build dependencies right?

listed/recursive/runtime don't make me(user) think about what "small" means.
There's also terms like 'leaf' and 'automatic' that describe these.

p.s. I cringe every time we add any new flag as it complicates the interface and makes it harder to clean up the interface later.

src/share/poudriere/bulk.sh Outdated Show resolved Hide resolved
Copy link
Member

@bdrewery bdrewery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this but wonder what the use case is and what it buys us? Is this so a user can provide just like 1 custom package and use FreeBSD repo otherwise?

If so I don't think we have the proper infrastructure to make that work seamlessly. From a user perspective, If FreeBSD upgrades/deletes a dep I overrode I'm in trouble. If FreeBSD upgrades a package I overrode, but haven't upgraded, I am maybe in trouble if I forgot to lock or repo priorities go wrong. If a dep is upgraded by FreeBSD and is no longer compat then my installed overrode package breaks. Then there's the same assumption that the package was built from the same dependencies and versions that FreeBSD provides. It's an easy out-of-sync situation.

What's the situation without this feature? Isn't it just about the same as above?

I think what we really need is strong shlib compat saving/management and/or an alternatives-like system that supports multiple versions of packages being installed. In my fantasy world with unlimited space/bandwidth I'd just have fat packages that store their files in hashed dirs to reduce duplication, symlinked from readable named paths.

@@ -200,6 +200,10 @@ when using
Do not skip dependent ports on findings.
This flag is automatically set when using
.Fl at .
.It Fl m
Build a minimal repository or thin repository which only contains the packages
that has been listed to be built, note that this option is incompatible with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/has/have/
s/built, note/built.\nNote/

@@ -7701,15 +7707,69 @@ sign_pkg() {
fi
}

add_pkg_to_repo() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs quotes around pathed vars

@@ -2595,7 +2595,12 @@ jail_start() {

# May already be set for pkgclean
: ${PACKAGES:=${POUDRIERE_DATA}/packages/${MASTERNAME}}
: ${THIN_PACKAGES:=${POUDRIERE_DATA}/packages/${MASTERNAME}-thin}
if [ ${SMALL_REPO} -eq 1 ]; then
THIN_EXT=-small
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should spell out what these are rather than "thin" or "small". Too bad -f/-F is taken as -f[ilter] makes sense to me.

-m all # default
-m listed
-m recursive / runtime # This only excludes build dependencies right?

listed/recursive/runtime don't make me(user) think about what "small" means.
There's also terms like 'leaf' and 'automatic' that describe these.

p.s. I cringe every time we add any new flag as it complicates the interface and makes it harder to clean up the interface later.

}

build_thin_repo() {
# Try to be as atomic as possible in recreating the new thin repo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've aggregated some comments for future lines here.

I don't think it's important to be atomic here as there is already a mechanism for it on by default. ATOMIC_PACKAGE_REPOSITORY. The directory being acted on here shouldn't be the real repository in that case.

# ls -al /poudriere/data/packages/exp-11amd64-commit-test
total 88
drwxr-xr-x   7 root  wheel  16 May 13 15:01 ./
drwxr-x--x  38 root  wheel  43 Jun  4 14:58 ../
lrwxr-xr-x   1 root  wheel  18 Oct 13  2017 .buildname@ -> .latest/.buildname
lrwxr-xr-x   1 root  wheel  20 Oct 13  2017 .jailversion@ -> .latest/.jailversion
lrwxr-xr-x   1 root  wheel  16 May 13 15:01 .latest@ -> .real_1589407304
drwxr-xr-x   4 root  wheel  10 Apr 22  2020 .real_1587540631/
drwxr-xr-x   4 root  wheel  10 Apr 25  2020 .real_1587845826/
drwxr-xr-x   4 root  wheel  10 May  4  2020 .real_1588616190/
drwxr-xr-x   4 root  wheel  10 May 13 14:58 .real_1589407113/
drwxr-xr-x   4 root  wheel  10 May 13 15:01 .real_1589407304/
lrwxr-xr-x   1 root  wheel  11 Mar  7  2017 All@ -> .latest/All
lrwxr-xr-x   1 root  wheel  14 Mar  7  2017 Latest@ -> .latest/Latest
lrwxr-xr-x   1 root  wheel  19 Mar  7  2017 digests.txz@ -> .latest/digests.txz
lrwxr-xr-x   1 root  wheel  17 Mar 23  2020 meta.conf@ -> .latest/meta.conf
lrwxr-xr-x   1 root  wheel  16 Mar  7  2017 meta.txz@ -> .latest/meta.txz
lrwxr-xr-x   1 root  wheel  23 Mar  7  2017 packagesite.txz@ -> .latest/packagesite.txz

There's not much reason to disable that feature. We disabled on the cluster systems because pkgsync expects a certain hardlink count. That could be reworked.

The way it works is, for example, bulk currently copies (cp -al hardlinks) the .latest into a .building. Then runs build_repo against that directory. Then later in bulk.sh commit_packages is ran which flips everything around so .building becomes the new real/latest directory.

Lots of ways to go about this but say for this feature we keep the All -> .latest symlinks all alone. Add a /thin or /small dir that symlinks into a .real_TS_filtered directory. Then on client systems you add /thin to the pkg repo name. Generating the directory would just be copying (hardlinks) the .building directory to .filter and removing unexpected packages (which sounds like pkgclean which is why I think most of this should be there).

git grep -F .building has some of the relevant places.

Comment on lines +7754 to +7868
if [ ${THIN_REPO} -eq 1 ]; then
msg "Creating thin pkg repository"
packages=${THIN_PACKAGES}
else
msg "Creating pkg repository"
packages=${PACKAGES}
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confuses me as a user. If I do a --thin build it will update packages in both directories but only the 'thin' one has its pkg files updated. There's no clear path to resolve that or to filter an existing repository down to the small/thing (hint: pkgclean).

local target=$2
[ -f ${target}/${pkgname}.${PKG_EXT} ] && return
if [ ${SMALL_REPO} -eq 1 ]; then
for dep in $(injail ${PKG_BIN} info -qd -F /packages/All/${pkgname}.${PKG_EXT}); do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this information in some metadata files that could avoid the need to use pkg here. Related to that I think this functionality makes sense in pkgclean which generates the same metadata. Moving towards that method could wait until later though as I think it brings some risk/complexity to the build itself and I have some pending stuff to push around it all.

Comment on lines +7725 to +7839
while mapfile_read_loop "all_pkgs" \
_pkgname _originspec _rdep _ignore; do
if [ "${_rdep}" = "listed" ] ; then
add_pkg_to_repo ${_pkgname} \
${THIN_PACKAGES}/All.new
fi
done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for pkgname in $(listed_pkgnames); do
    add_pkg_to_repo "${pkgname}" "${THIN_PACKAGES}/All.new"

Comment on lines +7742 to +7856
if [ -d "${THIN_PACKAGES}/All" ]; then
mv ${THIN_PACKAGES}/All ${THIN_PACKAGES}/All.old
fi
mv ${THIN_PACKAGES}/All.new ${THIN_PACKAGES}/All
if [ -d "${THIN_PACKAGES}/All" ]; then
rm -rf ${THIN_PACKAGES}/All.old
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted earlier I don't think this careful swap is needed, or if it is we should use a symlink anyway.


build_thin_repo() {
# Try to be as atomic as possible in recreating the new thin repo
mkdir -p ${THIN_PACKAGES}/All.new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this directory is leftover from a previous build? Should rm -rf here probably. But as noted earlier I don't think we need to do this .new and .old thing anyhow.

Comment on lines +57 to +61
-m -- minimal repository, only create a repository with the listed
packages, incompatible with -a.
-M -- medium repository, only create a repository with the listed
packages and their runtime dependencies, incompatible with -a.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My default reaction to new flags in PRs is to suggest or ask if this could be a hook instead. For this particular feature it seems like it could be entirely a hook in pkgclean. We could even provide this as an example hook.

@kevans91
Copy link
Contributor

kevans91 commented Nov 8, 2020

I like this but wonder what the use case is and what it buys us? Is this so a user can provide just like 1 custom package and use FreeBSD repo otherwise?

bapt was mentioning its equivalency to a PPA repo, but the thin repos could also immediately solve some issues without some of the problems described later in your message. e.g., the drm kmod packages have no irrelevant runtime dependencies that could cause those kinds of problems, so we could build them in a thin repo on and for 12.2 users without much hassle.

The medium repos address your concerns in the sense that they encapsulate all of the dependencies that could cause issues at runtime, and they solve brooks' problem that they just want to build and publish exactly what they want + run/lib depends for those to minimize the I/O work that pkg needs to do on some of their slower (emulators?) when searching the database or whatnot.

@bapt
Copy link
Member Author

bapt commented Nov 9, 2020

As for the usage all what @kevans91 said, about could we make it a hook, yes we could, but imho there will be too many users of those feature that it deserves a proper support. Think all maintainers could provide their own packages more easily. kde-devel repo, libreoffice-devel repo etc.

@brooksdavis
Copy link
Contributor

Our (CHERI) use case is very slow emulators (on par with FGPA implementations in the 10-50MHz range) with awful disk I/O. Reading a trivial pkg database can take minutes in some cases (I did an install of a clang/llvm/lld pkg and it took a couple hours). We've got access to some full-SoC FPGA emulations that are going to be even worse (though we probably won't be installing packages on them).

bapt added 2 commits May 6, 2021 16:26
if the new -m argument is passed to the bulk command then a new
repository with the same name as the regular one with -thin appended
is created, it only contains the packages that has been listed to be
built and nothing more.

The repo creation is done on that new repo along with siging

Note that this option is incompatible with -a
Unlike thin repo the small repos also includes the runtime dependencies
and the pkg itself.
@allanjude
Copy link
Member

The release of 12.3 is coming up in about a month. I think this feature was envisioned as the solution to the incompatibility between kmod's built on 12.2, and people running 12.3.

Is there a plan to finish this work?

@michael-o
Copy link

Stumbled upon this from #1025. This would be perfect fit for -b <URL>. When I compare the packages from <URL> and the fetched packages, they are duplicated rather than dropping them after the build or not adding to the final repo. As for PPA: here is another example: http://opensource.wandisco.com/rhel/8/svn-1.14/RPMS/x86_64/
Contains only required runtime packages, rest is pulled from default repo which makes sense.

So I am highly in favor of this.

@michael-o
Copy link

This, unfortunately, didn't make it into 3.4.0 :-(

@bdrewery bdrewery modified the milestones: 3.4.0, 3.5.0 Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants