Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide a way to not download tests/, examples and other such common directories #13491

Open
poliorcetics opened this issue Feb 26, 2024 · 8 comments
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Command-package Command-publish S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

Comments

@poliorcetics
Copy link
Contributor

Problem

By default cargo includes everything from the root of the package (or almost, see https://doc.rust-lang.org/cargo/reference/manifest.html?highlight=files#the-exclude-and-include-fields)

This is very nice for archiving purposes by crates.io, it allows example scraping from docs.rs and probably other purposes I haven't encountered, but it has an issue: it makes lots of CI downloads heavier than they need to be.

Looking through the dependencies we use at $work I see around a 100MiB of unused files just counting tests/, but there are also .github, benches/, examples/ and probably more I missed, and we have only ~600 deps out of the 138k crates on crates.io (without even counting all the available versions).

We have a cache to avoid spamming crates.io whenever possible so it's not like we request an extra 100MiB on each CI but still, I don't like putting pressure on it if we can avoid it.

As linked above, https://doc.rust-lang.org/cargo/reference/manifest.html?highlight=files#the-exclude-and-include-fields is intended to help with that but it's not automatic: people need to consider and maintain it for each crate (and from the start, else older versions will still have all the unused files)

Proposed Solution

We may need a way to say to cargo only fetch the absolute necessary to build the crate. Two issues with that:

  • What is the "absolute necessary": probably at least src/, Cargo.toml, license files, build.rs ? Some -sys crates may need more
  • Changing that silently in a backward compatible way is probably hard ? I think it would break the hashes so it could not become the default automatically at the very least.

It could take the form of cargo fetch --no-extras (to be bikeshedded) ?

Notes

Note: I'm excluding binary crates here, but they could easily be included too, if only so cargo install downloads less if possible, but I don't think binary crates are downloaded as much as library ones.

I also don't know if the benefits would be worth it, since it could mean one of:

  • Recomputing the set of files on each download
  • Storing two archives of the same crate&version

I have no data saying the bandwidth gains would be enough to offset either the compute or space gains, if anyone knows (the infra team ?) I would be happy to be proven wrong!

@poliorcetics poliorcetics added C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage. labels Feb 26, 2024
@epage epage added Command-publish Command-package S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. and removed S-triage Status: This issue is waiting on initial triage. labels Feb 26, 2024
@epage
Copy link
Contributor

epage commented Feb 26, 2024

For cargo vendor, we also have #13474.

Breaking down the extraneous files you mentioned:

  • .github there is no reason to include and that is on the package for including it.
  • For examples/, that can tie into the documentation and can be important for including (this isn't just for crates.io but for local docs builds)
  • For tests/ and benches/, that is important for crater and people wanting to do similar for crater.

The challenge is knowing what is needed for build or not among the files. A heuristic can work but that can only go so far (much like we provide package.include and package.exclude in case our existing heuristic isn't enough.

There is also the question of "what is essential". For example, explicitly referenced targets are sometimes required just to parse manifests today (see #13456).

There would need to be figured out to get this to work.

@epage
Copy link
Contributor

epage commented Feb 26, 2024

For the most part, we are talking text here. I wonder if we'd be better spending our time focusing on aspects of this, like

@weihanglo
Copy link
Member

I would like to call out that the #[path] attribute, while not widely used, would cause problems if Cargo excludes files without recognizing where they point to. Besides, changing or disable the heuristics might affect other parts in Carog, like #4587, though this doesn't mean that Cargo cannot change the current rule.

@poliorcetics
Copy link
Contributor Author

For cargo vendor, we also have #13474.

Ah I looked for cargo fetch issues, I missed that one. It's not exactly the same problem but very close

The challenge is knowing what is needed for build or not among the files. A heuristic can work but that can only go so far (much like we provide package.include and package.exclude in case our existing heuristic isn't enough.

package.include and .exclude must be set by users and I think at this point experience has proven it's not gonna happen often.

I also agree that crates.io should have everything, tests and examples included, I'm not advocating for dropping that! I would like the ability to not download them but as I said, it's hard to know what to download and what not to.

Does cargo check hashes again once a library has been fetched ? If not if would allow me to just delete the unneeded directories and files on my side without interfering with cargo itself

@epage
Copy link
Contributor

epage commented Feb 26, 2024

I don't think we re-check hashes but i would not rely on that never changing.

@poliorcetics
Copy link
Contributor Author

i would not rely on that never changing.

It should be a line or two of bash, it can be deleted if an update breaks it :)

@poliorcetics
Copy link
Contributor Author

Update: cargo is too well-made and check hashes even in vendored dependencies so my two lines of bash won't work 😞

@poliorcetics
Copy link
Contributor Author

A simpler way to do this then would be to allow not checking hashes when compiling ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Command-package Command-publish S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.
Projects
None yet
Development

No branches or pull requests

3 participants