Support for partial clones
This release adds support for partial clones with mepo clone
. A new option --partial
has three settings:
off
: performs a "normal"git clone
blobless
: performs a blobless clone viagit clone --filter=blob:none
treeless
: performs a treeless clone viagit clone --filter=tree:0
The default is tecnically none of these, but that is equivalent to off
. The off
option is mainly added here to allow overriding of any .mepoconfig
setting (see below).
The motivation for this is that it has been noticed that clones of GEOS (mainly MAPL) are often extremely slow. For example:
❯ time git clone [email protected]:GEOS-ESM/MAPL.git MAPL-normal
Cloning into 'MAPL-normal'...
remote: Enumerating objects: 704212, done.
remote: Counting objects: 100% (271489/271489), done.
remote: Compressing objects: 100% (6882/6882), done.
remote: Total 704212 (delta 269709), reused 266119 (delta 264584), pack-reused 432723
Receiving objects: 100% (704212/704212), 997.08 MiB | 2.59 MiB/s, done.
Resolving deltas: 100% (694344/694344), done.
noglob git clone [email protected]:GEOS-ESM/MAPL.git MAPL-normal 95.46s user 13.88s system 24% cpu 7:17.37 total
This took over 7 minutes to clone!
However, git supports partial clones as detailed in this GitHub Blog post. Now, of the two, blobless clones are fairly safe and give you faster initial clone speed at the cost of slower operations after that. As a test:
❯ time git clone --filter=blob:none [email protected]:GEOS-ESM/MAPL.git MAPL-blobless-from-git
Cloning into 'MAPL-blobless-from-git'...
remote: Enumerating objects: 21299, done.
remote: Counting objects: 100% (3171/3171), done.
remote: Compressing objects: 100% (1324/1324), done.
remote: Total 21299 (delta 1972), reused 2989 (delta 1829), pack-reused 18128
Receiving objects: 100% (21299/21299), 20.80 MiB | 2.99 MiB/s, done.
Resolving deltas: 100% (13343/13343), done.
remote: Enumerating objects: 942, done.
remote: Counting objects: 100% (552/552), done.
remote: Compressing objects: 100% (500/500), done.
remote: Total 942 (delta 112), reused 126 (delta 50), pack-reused 390
Receiving objects: 100% (942/942), 1.85 MiB | 136.00 KiB/s, done.
Resolving deltas: 100% (249/249), done.
Updating files: 100% (1064/1064), done.
noglob git clone --filter=blob:none [email protected]:GEOS-ESM/MAPL.git 1.38s user 0.54s system 6% cpu 27.609 total
28 seconds!
Treeless clones are usually faster than blobless as you aren't just filtering out blobs but whole trees. But per the blog:
We strongly recommend that developers do not use treeless clones for their daily work. Treeless clones are really only helpful for automated builds when you want to quickly clone, compile a project, then throw away the repository. In environments like GitHub Actions using public runners, you want to minimize your clone time so you can spend your machine time actually building your software! Treeless clones might be an excellent option for those environments.
As there are possible scenarios with CI that this could be useful for, the option is added. As for speed:
❯ time git clone --filter=tree:0 [email protected]:GEOS-ESM/MAPL.git MAPL-treeless
Cloning into 'MAPL-treeless'...
remote: Enumerating objects: 6875, done.
remote: Counting objects: 100% (843/843), done.
remote: Compressing objects: 100% (730/730), done.
remote: Total 6875 (delta 124), reused 805 (delta 113), pack-reused 6032
Receiving objects: 100% (6875/6875), 2.24 MiB | 277.00 KiB/s, done.
Resolving deltas: 100% (757/757), done.
remote: Enumerating objects: 106, done.
remote: Counting objects: 100% (70/70), done.
remote: Compressing objects: 100% (66/66), done.
remote: Total 106 (delta 1), reused 19 (delta 0), pack-reused 36
Receiving objects: 100% (106/106), 37.34 KiB | 538.00 KiB/s, done.
Resolving deltas: 100% (3/3), done.
remote: Enumerating objects: 942, done.
remote: Counting objects: 100% (552/552), done.
remote: Compressing objects: 100% (500/500), done.
remote: Total 942 (delta 112), reused 126 (delta 50), pack-reused 390
Receiving objects: 100% (942/942), 1.85 MiB | 2.09 MiB/s, done.
Resolving deltas: 100% (249/249), done.
Updating files: 100% (1064/1064), done.
noglob git clone --filter=tree:0 [email protected]:GEOS-ESM/MAPL.git 0.45s user 0.25s system 4% cpu 15.950 total
16 seconds!
Along with this option, we also add a new .mepoconfig
setting where one can add:
[clone]
partial = blobless
and blobless clones will be the default.
From CHANGELOG.md
Added
- Added new
--partial
option tomepo clone
with two settings:off
,blobless
, andtreeless
. If you set,--partial=blobless
then
the clone will not download blobs by using--filter=blob:none
. If you set--partial=treeless
then the clone will not download
trees by using--filter=tree:0
. Theblobless
option is useful for large repos that have a lot of binary files that you don't
need. Thetreeless
option is even more aggressive and SHOULD NOT be used unless you know what you are doing. The
--partial=off
option allows a user to override the default behavior of--partial
in.mepoconfig
and turn it off for a
run ofmepo clone
. - Add a new section for
.mepoconfig
to allow users to set--partial
as a default formepo clone
.
What's Changed
- GitFlow: Merge main into develop for hotfix by @mathomp4 in #262
- Add support for partial clones by @mathomp4 in #264
- Protect submodule repos from treeless, add off option by @mathomp4 in #266
- GitFlow: Merge develop into main by @mathomp4 in #265
Full Changelog: v1.51.1...v1.52.0