Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: vacuum unused packages #3467

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Conversation

suzuki-shunsuke
Copy link
Member

@suzuki-shunsuke suzuki-shunsuke commented Jan 22, 2025

Close #2942

aqua vacuum [--days (-d) 60]
aqua vacuum --init

aqua vacuum removes packages and timestamp files if timestamp is older than vacuum days.

aqua vacuum --init searches packages from aqua.yaml including global configuration files $AQUA_GLOBAL_CONFIG and creates timestamp files if they are installed.

How does it work?

aqua manages each package's last used datetime in $(aqua root-dir)/metadata/<package path>/timestamp.txt.
aqua creates or updates these files when packages are installed or executed.
aqua vacuum removes packages and timestamp files if timestamp is older than vacuum days.

Until packages are installed or executed, timestamp files aren't created.
If packages don't have timestamp files, aqua vacuum doesn't remove those packages.

To solve the problem, aqua vacuum --init searches packages from aqua.yaml including global configuration files $AQUA_GLOBAL_CONFIG and creates timestamp files if they are installed.

Compared with #3442

#3442 uses boltDB to manage last used dates.
On the other hand, this pr uses text files to manage them.
This pull request is much simpler than #3442 and the overhead is much cheaper.
We don't need to learn boltDB, and we don't need to maintain complicated asynchronous codes.
Complicated code makes the maintenance hard and raises bugs easily.
So simplicity is justice.

This pull request doesn't implement functions to show timestamps because the feature is used only for debug.
This pull request implements the function to initialize timestamps by the current datetime.

Performance

$ AQUA_VACUUM_DAYS=5 hyperfine -r 100 -N --warmup 3 '/Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v' '/Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v'

Benchmark 1: /Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v
  Time (mean ± σ):      38.7 ms ±   0.9 ms    [User: 6.4 ms, System: 1.9 ms]
  Range (min … max):    37.3 ms …  42.1 ms    100 runs
 
Benchmark 2: /Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v
  Time (mean ± σ):      41.0 ms ±   1.3 ms    [User: 5.7 ms, System: 1.8 ms]
  Range (min … max):    39.0 ms …  44.3 ms    100 runs
 
Summary
  /Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v ran
    1.06 ± 0.04 times faster than /Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v

@suzuki-shunsuke
Copy link
Member Author

Is the command name vacuum good?

There are several options

  • Rename vacuum to clean or something
  • Merge vacuum with remove command

@NikitaCOEUR
Copy link

NikitaCOEUR commented Jan 22, 2025

About command name

Is the command name vacuum good?

There are several options

* Rename `vacuum` to `clean` or something

* Merge `vacuum` with `remove` command

When I was working on the integration, I wanted to use "remove" but there were dependencies in the controller with elements that were, with my level in golang..., too important. It seemed to me that there was a dependency with the aqua.yaml files that I didn't want to be found again when I ran the "vacuum", deleting packages shouldn't require the configuration of a file like aqua.yaml.

If I have X repositories with different yaml configurations, I didn't want the command to interact only with the packages defined in one of them.

So I wanted to have a complete decorrelation with remove. And when I proposed the idea in the https://github.com/orgs/aquaproj/discussions/3086, I had thought of vacuum rather than clean, as I felt that clean was too close to remove in connotation. And I thought of Postgres, which implements the vacuum command. In the end, it seemed more appropriate.

Also, adding timestamp management to the use/download of each package seemed too far removed from the term "remove", I felt that adding a new feature responsible for the overall functionality of remembering the elements it needs to clean was more coherent.
So, in my opinion:

  1. Add vacuum command.
  2. Full integration with the remove command (with a vacuum parameter to execute this action).

What don't you like about "vacuum"?

Whether or not to use a db bbolt

I would have loved to have my name on this repository code and i found the use of a bbolt really sexy.
However, aqua is a tool I appreciate, and you're someone I follow on the various projects you do. I'm really impressed by your presence, your ideas, your follow-through and your ability to think on your feet.
I know that you're (almost) the only one working on these projects, so it's vital that you master the tool and the code that make it up to perfection, and that you have the final say on how it's done.

But I still think that using a bbolt is more appropriate, as it seems less "cobbled-together". Then perhaps registry management or other elements could be integrated in the future?
But the desire to limit overhead as much as possible forces the use of tricks and tends towards code complexity, as can be seen in #3442. (Asynchronous execution, even execution of an independent process to get around the limitation of exec commands in syscall_unix).

In short, you have the last word!

@suzuki-shunsuke
Copy link
Member Author

Thank you for your comment.

Now I think adding a vacuum command is best.

About the command name, note that I'm not a native English speaker and I'm not so good at English.
So my feeling may be incorrect.

The word "vacuum" brought to mind the noun meaning "a void" or "emptiness."
But seems like "vacuum" has a meaning of verb to clean using a vacuum cleaner, which I feel suitable.

https://www.ldoceonline.com/dictionary/vacuum

I had thought of vacuum rather than clean, as I felt that clean was too close to remove in connotation.

Agree. clean is confusing.

About merging, I considered API.

e.g.

remove --expired [--init-timestamp, --init-vacuum] [--expired-days 60]

But I don't think this is user-friendly.
Both vacuum and remove are commands to remove packages, but the mechanism is totally different.
So now I think adding a new command is better for both users and maintainers rather than merging these commands.

I would have loved to have my name on this repository code and i found the use of a bbolt really sexy.

Yeah, I'm so sorry to create a new pull request instead of your pull request.
I really appreciate and respect your effort, and I wanted to merge you pull request.
Without your effort, I would never work on this feature and create this pull request.

I think BoltDB is an awesome project. I was glad to learn it in this time.
Maybe I'll use it in my OSS or work.
But in this time I feel it's unnecessary.
If we will need it in future, we'll be able to reconsider it, but it's not now.


Ideally, I'd like to store metadata in the same directory with each package.

Now:

pkgs/<package path>/<command>
metadata/<package path>/timestamp.txt # Maybe we will add other files like metadata.json

Ideal:

(There is a room to improve file and directory names and structure)

pkgs/<package path>/
  aqua-package.json # A file to identify the package directory. This is useful to list installed packages. We can initialize all packages' timestamps using the current date time.
  metadata/
    timestamp.txt
  package/
    <command>

But this is a breaking change, so we gave up it.

@suzuki-shunsuke
Copy link
Member Author

@suzuki-shunsuke suzuki-shunsuke marked this pull request as ready for review January 23, 2025 00:59
@suzuki-shunsuke
Copy link
Member Author

suzuki-shunsuke commented Jan 23, 2025

v2.43.0-0 is out.

https://github.com/aquaproj/aqua/releases/tag/v2.43.0-0

You can try this version:

aqua upa v2.43.0-0

The document is here. aquaproj/aquaproj.github.io#1342
I'll try this for a while (a few days?).

Note:

  1. AQUA_VACUUM_DAYS is optional. Last used days are recorded when packages are installed or executed even if AQUA_VACUUM_DAYS is not set.
    The default expiration days is 60.
  2. I removed the command alias v. Perhaps we'll recover the alias if necessary

@suzuki-shunsuke
Copy link
Member Author

suzuki-shunsuke commented Jan 23, 2025

  • aqua rm --all should remove metadata directory
  • aqua rm should remove removed packages' timestamp.txt

https://github.com/aquaproj/aqua/releases/tag/v2.43.0-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Command to remove outdated packages/binaries
2 participants