license-scanner does not provide legal advice and it is not a lawyer. Licenses are identified exclusively by automated means, without any step of human verification, and thus the verdict is subject to bugs in the software and incomplete heuristics which might yield false positives.
license-scanner aims to provide best-effort automated license scanning. Regardless of how well it performs, its accuracy should not be relied upon as the ultimate verdict for legal purposes. You should seek independent legal advice for any licensing questions that may arise from using this tool.
license-scanner is a source code license scanner based on file contents (e.g. LICENSE files, license headers, copyright notices) and project metadata. The Usage section provides explanations on what it does and how to use it.
Before starting to work on this project we recommend reading the Implementation section.
Parity intends to primarily use license-scanner for Rust projects, therefore the following files are supported
- Cargo.toml
- Cargo.lock
Should more files be relevant in future Rust versions, logically they should be supported as well.
You are welcome to suggest other files, even if they are not Rust-related, which would make sense for us to support going forward by opening a request ticket.
Requirements:
cargo
- Node.js LTS
yarn
- If it's not already be bundled with Node.js, install with
npm install -g yarn
- If it's not already be bundled with Node.js, install with
yarn install
yarn build
# use `scan` for scanning
yarn start -- scan /directory/or/file [...more/directories/or/files]
# Mark the end of variadic options with a `--` or another non-variadic `--xxx` option
yarn start -- scan --exclude target/debug target/release -- /directory/or/file
yarn start -- scan --exclude target/debug target/release --log-level debug /directory/or/file
# after the scan is complete, optionally dump it to CSV
yarn start -- dump /directory/or/file /output.csv
If a single file is provided, the scan will be performed exclusively for that file.
If a directory is provided, it will be scanned recursively. Should license-scanner find any of the supported project metadata files, it will detect and download all of its dependencies. After downloading a dependency, license-scanner will scan their code non-recursively, i.e. the search will cover the target directory's dependencies but not dependencies of dependencies.
The scan results are saved to a db.json
file directly in this repository. You
are able to further tweak those results through
--start-lines-excludes
and
--detection-overrides
.
Consider the following directory structure:
/directory
├── LICENSE-MIT
├── Cargo.toml
After scanning that directory with yarn start -- scan /directory
, a
db.json
file will be created in the root of this repository with the following
structure:
{
"scanResult": {
"/directory": {
"LICENSE-MIT": {
"license": { "id": "MIT" },
},
"foo-0.1 file: src/main.rs": {
"license": { "id": "GPL-3.0-only" },
}
}
}
}
- Each scanned directory is registered as an item in
.scanResult
where their key is the absolute path (in this case,/directory
) - Each file within the directory is registered as an item in
.scanResult["/directory"]
where its key (ID) is the path of the file relative to the directory - Each file in a crate (crates are found through
Cargo.toml
) is registered as an item in.scanResult["/directory"]
where its key (ID) is a combination of the crate's versioned identifier plus the path of the file relative to the crate's directory. In the example above we found asrc/main.rs
file inside of a crate namedfoo
which has version0.1
.
IDs (the keys used for each object) are useful in case you want to override the detection for a given file using --detection-overrides.
Takes as argument a configuration file specifying Override Rules (example) which can be used to override the automatic detection. Use it as:
scan --detection-overrides configuration.json
Each Override Rule object should have the following fields:
This field defines the ID of the result you wish to override (IDs are formatted according to the rules explained in the Walkthough section).
For example, if you want to override the results for the file
crates/metrics/analyze.rs
:
{
"id": "crates/metrics/analyze.rs",
"result": { "license": "Apache-2.0" }
}
As another example, if you wish to override the results for:
- Crate: adder
- Crate version: 0.2
- File: src/main.rs
The following rule should be used:
{
"id": "adder-0.2 file: src/main.rs",
"result": { "license": "MIT" }
}
This field is exclusive with
"starts_with"
, meaning you should
choose either of them, not both.
Instead of
overriding the detection for a single ID,
this field defines the start of IDs (IDs are formatted according to the
rules explained in the Walkthough section) whose results
should be overridden; that is, any IDs starting with this field's value will be
overridden by the specified "result"
. This is usually useful for making an
override apply to a whole directory or crate.
For example, if you want to override the results for the whole docs/
directory:
{
"starts_with": "docs/",
"result": { "license": "CC-BY-1.0" }
}
As another example, if you wish to override the results the whole crate
messenger
whose version is 0.1
:
{
"starts_with": "messenger-0.1 file:",
"result": { "license": "MIT" }
}
This field is exclusive with "id"
, meaning
you should choose either of them, not both.
The result which will be assigned to items matching the expression provided
through "id"
or
"starts_with"
. The provided value
will replace the automatic detection's result completely for the matched items.
Use "result": null
to omit the file from the results completely or provide a
ScanResultItem
as a replacement.
Provide a reference file whose contents will be compared against the contents of the file matched to the ID you're overriding. The program will stop the scan if the content being provided for comparison does not match the content found during the scan.
This field provides a way of avoiding the problem of a file being changed over time without your knowledge and thus possibly making the result incorrect.
Takes as argument a plain-text file which specifies lines to be excluded from the top of the file during the text normalization step (example). It is useful for removing "Copyright (c) Foo Bar" boilerplate at the start of licenses which might make the detector misrecognize them. Use it as:
scan --start-lines-excludes excludes.txt
For instance, if you see lots of licenses starting with the following template:
Copyright (c) Foo Bar, 2019
All right reserved.
[actual license here]
It will be helpful to provide a file through --start-lines-excludes
with the
following contents:
Copyright (c) Foo Bar, 2019
All right reserved.
Doing so will remove the specified boilerplate lines from the top of the licenses so that the detector will be able get to the actual license's text cleanly.
If configured, the scan will make sure that all scanned files are licensed.
- With
--ensure-licenses
, every file needs to be licenses with one of the provided licenses. - With
--ensure-any-license
, every file needs to be licenses with any license.
Examples:
yarn start -- scan --ensure-licenses Apache-2.0 GPL-3.0-only -- /directory/or/file
yarn start -- scan --ensure-any-license /directory/or/file
Those options are conflicting with each other so only one should be specified. By default, no licensing is enforced.
If configured, the scan will make sure that if a license header references a product, it will be the correct product and not a result of a copy-paste error.
For example, this fragment references the Substrate
product.
// Substrate is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
Examples:
yarn start -- scan --ensure-product Polkadot -- /directory/or/file
It treats a different product reference as an error, but it allows a generic "this program".
Can be used to exclude files or directories from the scan.
- Most useful in the combination with
--ensure-licenses
. - The excluded path can be absolute or relative.
Scan only files with the specified extensions.
Examples:
yarn start -- scan --ensure-licenses Apache-2.0 --file-extensions '.rs' -- /directory/or/file
scan
is the entrypoint for this project. Instead of being coupled to
main
,
all the scan-related code is purposefully designed as a library so that it can
be used easily on other projects (we plan to use license-scanner for the CLA bot).
As documented in the Usage section, license-scanner also
scans crates
and that is where
rust-crate-scanner
comes into play (we use it to
detect crates from lockfiles).
Note that you do not need to manually compile rust-crate-scanner before running
the CLI because
cargo run
already will automatically (re)compile the project if necessary.