Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: new mode of operation to exclude crates that don’t make it into the compiled artefact (more precise and accurate, but more work to determine) #46

Open
chris-morgan opened this issue Jun 21, 2021 · 4 comments

Comments

@chris-morgan
Copy link

It’s very common to have declared dependencies that don’t actually end up in the compiled result. Sometimes this is formalised via optional dependencies and Cargo features, but more of the time it isn’t—and even when it is, sometimes those features are accidentally enabled unnecessarily.

Such unused dependencies are not actually relevant for license compliance purposes and should preferably be removed; but cargo-license can’t currently do this.

(It’s probably even more common to have dependencies that are included in the compiled result but are never activated at runtime, but until they’re removed cargo-license is correct to care about them, and to get them removed you’d need to convince the compiler that they can’t be activated, which is commonly halting problem territory.)

Take this scenario:

  • There is a Crate C.
  • There is a Crate B which depends on Crate C, and exposes two functions, uses_c and does_not_use_c, which do as their names suggest.
  • We are Crate A and depend on Crate B, calling its function does_not_use_c.

In this case, nothing from Crate C will end up in the final compiled artefact, and so we are not bound by Crate C’s license terms. Yet cargo-license will currently still include Crate C in its reckoning.

I propose a mode of operation for cargo-license that effectively runs cargo check and then traverses the compiler’s output at that point, eliminating from its reckoning any crate that has no contents present in the compiled artefact.

Implementation details aside, the main potential hitch I see here is macros and other compile-time-only dependencies. Legally, macro output will be license-bound some times but not others, depending on what it's doing and how it does it. I suppose build-dependencies are easy enough to flag in this way (they already are, to a large extent, though this would provide new opportunities for transitive removal of dev-dependencies and build-dependencies), but I’m not sure if you can track macro stuff through the compilation or not.

I think my ideal form of output would be dividing the crates into three categories:

  • “definitely used, follow the license”;
  • “potentially used (at build time or via macros), follow the license unless you investigate carefully”; and
  • “definitely not used on this target, don’t worry about the license for this target”.

This issue is related to #42, which was closed by #43, which took things as far as possible without compiling code. The step I’m proposing would, I expect, be a rather large increase in scope, but I think it’s worthwhile.

@WyvernIXTL
Copy link

I found that cargo tree -e normal (cargo tree -e normal --color never --prefix none -f "{p}" --no-dedupe) only prints crates included in the final build.

@chris-morgan
Copy link
Author

@WyvernIXTL That’s not correct. It only excludes things like build and dev dependencies, not crates that just don’t make it into the final build artefact. If you’re not sure about the distinction I’m drawing, please read my proposal again, I believe it’s pretty clear.

@WyvernIXTL
Copy link

@chris-morgan
I don't know, if I am right or not, but I know that cargo tree is missing some dependencies I do not use. In this case serde:

# bat Cargo.toml
───────┬──────────────────────────────────────────────────────────────────────────
       │ File: .\Cargo.toml
───────┼──────────────────────────────────────────────────────────────────────────
   1   │ [package]
   2   │ name = "serialize-bincode"
   3   │ version = "0.1.0"
   4   │ edition = "2021"
   5   │
   6   │ [dependencies]
   7   │ bincode = "=2.0.0-rc.3"
   8   │ lz4_flex = "0.11.3"
   9   │
───────┴──────────────────────────────────────────────────────────────────────────

# cargo license
(MIT OR Apache-2.0) AND Unicode-DFS-2016 (1): unicode-ident
Apache-2.0 OR MIT (7): cfg-if, proc-macro2, quote, serde, serde_derive, static_assertions, syn
MIT (5): bincode, bincode_derive, lz4_flex, twox-hash, virtue
N/A (1): serialize-bincode

# cargo tree
serialize-bincode v0.1.0
├── bincode v2.0.0-rc.3
│   └── bincode_derive v2.0.0-rc.3 (proc-macro)
│       └── virtue v0.0.13
└── lz4_flex v0.11.3
    └── twox-hash v1.6.3
        ├── cfg-if v1.0.0
        └── static_assertions v1.1.0

Moreover this comment lets me believe that there is a real differentiation:

When there is a weak dependency (like bitvec?/std), the dependency resolver has to assume that bitvec might be enabled. There is a separate pass that then does feature resolution, which may decide that in the end it isn't enabled. But the lock file is built based on the first pass (the dependency resolver).

rust-lang/cargo#11444 (comment)

@chris-morgan
Copy link
Author

Huh, I’d just assumed feature selection was already being done when #42 was filed, or that #43 also fixed that; but I didn’t check it. I suppose (again without checking it) that only handles platform selection. No idea how this stuff is actually structured or implemented, but I’m guessing cargo-license is doing stuff itself rather than using the tools that now are built into Cargo (but which I don’t think were back then).

I retract my “which took things as far as possible without compiling code” from my initial description. There’s apparently still feature resolution to go, which should be comparatively straightforward. What I proposed is still a long way beyond that, a whole ’nother kettle of fish for difficulty, involving reachability analysis of the compiled artefact, or similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants