-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement go-licenses/v2 #67
Changes from 5 commits
259755a
a474079
4738d12
534b3ec
6f6e05b
09280f3
4e907c7
0fa3e13
f4dbbf5
fdd91dd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
go-licenses | ||
dist | ||
.DS_Store |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
|
||
.PHONY: test | ||
test: | ||
go test $$(go list ./... | grep -v /NOTICES/) | ||
|
||
.PHONY: test-debug | ||
test-debug: | ||
go test -v $$(go list ./... | grep -v /NOTICES/) | ||
|
||
.PHONY: build-linux | ||
build-linux: clean | ||
mkdir -p dist/linux | ||
GO111MODULE=on \ | ||
CGO_ENABLED=0 \ | ||
GOOS=linux \ | ||
GOARCH=amd64 \ | ||
go build -tags netgo -ldflags '-extldflags "-static"' -o dist/linux/go-licenses github.com/google/go-licenses/v2 | ||
|
||
.PHONY: build-darwin | ||
build-darwin: clean | ||
mkdir -p dist/darwin | ||
GO111MODULE=on \ | ||
CGO_ENABLED=0 \ | ||
GOOS=darwin \ | ||
GOARCH=amd64 \ | ||
go build -tags netgo -ldflags '-extldflags "-static"' -o dist/darwin/go-licenses github.com/google/go-licenses/v2 | ||
|
||
.PHONY: dist | ||
dist: dist-linux dist-darwin | ||
|
||
.PHONY: dist-linux | ||
dist-linux: build-linux | ||
mkdir -p dist/linux | ||
cp -r NOTICES dist/linux/ | ||
cp -r third_party/google/licenseclassifier/licenses dist/linux/ | ||
tar -C dist/linux -czf dist/go-licenses-linux.tar.gz \ | ||
go-licenses \ | ||
licenses \ | ||
NOTICES | ||
|
||
.PHONY: dist-darwin | ||
dist-darwin: build-darwin | ||
mkdir -p dist/darwin | ||
cp -r NOTICES dist/darwin/ | ||
cp -r third_party/google/licenseclassifier/licenses dist/darwin/ | ||
tar -C dist/darwin -czf dist/go-licenses-darwin.tar.gz \ | ||
go-licenses \ | ||
licenses \ | ||
NOTICES | ||
|
||
.PHONY: install | ||
install: dist-linux | ||
cp dist/go-licenses-linux.tar.gz ~/bin/ | ||
cd ~/bin && tar xvf go-licenses-linux.tar.gz && rm -rf NOTICES && rm go-licenses-linux.tar.gz | ||
|
||
.PHONY: clean | ||
clean: | ||
rm -rf dist/linux | ||
|
||
.PHONY: upload | ||
upload: | ||
gsutil cp dist/go-licenses-linux.tar.gz gs://gongyuan-dev/licenses/go-licenses.tar.gz | ||
|
||
.PHONY: csv | ||
csv: dist-linux | ||
dist/linux/go-licenses csv -v 4 | ||
|
||
.PHONY: save | ||
save: | ||
dist/linux/go-licenses save -v 4 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,219 @@ | ||
# go-licenses | ||
|
||
## **THIS IS STILL UNDER DEVELOPMENT** | ||
|
||
A tool to automate license management workflow for go module project's dependencies and transitive dependencies. | ||
|
||
## Install | ||
|
||
Download the released package and install it to your PATH: | ||
TODO: udpate URL after release. | ||
|
||
```bash | ||
curl -LO download-url/go-licenses-linux.tar.gz | ||
tar xvf go-licenses-linux.tar.gz | ||
sudo mv go-licenses/* /usr/local/bin/ | ||
# or move the content to anywhere in PATH | ||
``` | ||
|
||
## Output Example | ||
|
||
<!-- TODO: update NOTICES folder of this repo. --> | ||
<!-- [NOTICES folder](./NOTICES) is an example of generated NOTICES for go-licenses tool itself. --> | ||
|
||
Examples used in Kubeflow Pipelines: | ||
|
||
* [go-licenses.yaml (config file)](https://github.com/kubeflow/pipelines/blob/master/v2/go-licenses.yaml) | ||
* [license_info.csv (generated)](https://github.com/kubeflow/pipelines/blob/master/v2/third_party/license_info.csv) | ||
* [NOTICES/licenses.txt (generated)](https://github.com/kubeflow/pipelines/blob/master/v2/third_party/NOTICES/licenses.txt) | ||
|
||
## Usage | ||
|
||
### One-off License Update | ||
|
||
1. Get version of the repo you need licenses info: | ||
|
||
```bash | ||
git clone <go-mod-repo-you-need-license-info> | ||
cd <go-mod-repo-you-need-license-info> | ||
git checkout <version> | ||
``` | ||
|
||
1. Write down a minimal config file specifying your module name and which binary to analyze: | ||
|
||
```yaml | ||
module: | ||
go: | ||
module: github.com/google/go-licenses/v2 | ||
path: . | ||
binary: | ||
path: dist/linux/go-licenses | ||
``` | ||
|
||
1. Get dependencies from go modules and generate a `license_info.csv` file of their licenses: | ||
|
||
```bash | ||
go-licenses csv | ||
``` | ||
|
||
The csv file has three columns: `depdency`, `license download url` and inferred `license type`. | ||
|
||
Note, the format is consistent with [google/go-licenses](https://github.com/google/go-licenses). | ||
|
||
1. The tool may fail to identify: | ||
|
||
* Download url of a license: they will be left out in the csv. | ||
* SPDX ID of a license: they will be named `Unknown` in the csv. | ||
|
||
Please check them manually and update your `go-licenses.yaml` config to fix them, refer to [the example](./go-licenses.yaml). After your config fix, re-run the tool to generate lists again: | ||
|
||
```bash | ||
go-licenses csv | ||
``` | ||
|
||
Iterate until you resolved all license issues. | ||
|
||
1. Download notices, licenses and source folders that should be distributed along with the built binary: | ||
|
||
```bash | ||
go-licenses save | ||
``` | ||
|
||
Notices and licenses will be concatenated to a single file called `NOTICES/license.txt`. | ||
Source code folders will be copied to `NOTICES/<module/import/path>`. | ||
|
||
Notices folder location can be configured in [the go-licenses.yaml example](./go-licenses.yaml). | ||
|
||
Some licenses will be rejected based on its [license type](https://github.com/google/licenseclassifier/blob/df6aa8a2788bdf5ac382148c2453a407a29819b8/license_type.go#L341). | ||
|
||
### Integrating in CI | ||
|
||
Typically, I think we should check `licenses_info.csv` into source control and | ||
download license contents when releasing. | ||
|
||
An early idea for CI is to run a simple script: | ||
|
||
1. clones the repo, run `go-licenses csv`. | ||
1. verifies if generated `licenses_info.csv` if up-to-date as the version in the repo. | ||
|
||
We might worry about flakiness, because various dependencies could be down | ||
temporarily. Another simpler idea is to let the script do: | ||
|
||
1. If `go.mod` has been updated, but not the license files. | ||
1. Fails and says you should update the license files. | ||
|
||
## Implementation Details | ||
|
||
Rough idea of steps in the two commands. | ||
|
||
`go-licenses csv` does the following to generate the `license_info.csv`: | ||
|
||
1. Load `go-licenses.yaml` config file, the config file can contain | ||
* module name | ||
* built binary local path | ||
* module license overrides (path excludes or directly assign result license) | ||
1. All dependencies and transitive dependencies are listed by `go version -m <binary-path>`. When a binary is built with go modules, used module info are logged inside the binary. Then we parse go CLI result to get the full list. | ||
1. Scan licenses and report problems: | ||
1. Use <github.com/google/licenseclassifier/v2> detect licenses from all files of dependencies. | ||
1. Report an error if no license found for a dependency etc. | ||
1. Get license public URLs: | ||
1. Get a dependency's github repo by fetching meta info like `curl 'https://k8s.io/client-go?go-get=1'`. | ||
1. Get dependency's version info from go modules metadata. | ||
1. Combine github repo, version and license file path to a public github URL to the license file. | ||
1. Generate CSV output with module name, license URL and license type. | ||
1. Report dependencies the tool failed to deal with during the process. | ||
|
||
`go-licenses save` does the following: | ||
|
||
1. Read from `license_info.csv` generated in `go-licenses csv`. | ||
1. Call [github.com/google/licenseclassifier](https://github.com/google/licenseclassifier) to get license type. | ||
1. Three types of reactions to license type: | ||
* Download its notice and license for all types. | ||
* Copy source folder for types that require redistribution of source code. | ||
* Reject according to <https://github.com/google/licenseclassifier/blob/df6aa8a2788bdf5ac382148c2453a407a29819b8/license_type.go#L341>. | ||
|
||
## Credits | ||
|
||
go-licenses/v2 is greatly inspired by | ||
|
||
* [github.com/google/go-licenses](https://github.com/google/go-licenses) for the commands and compliance workflow | ||
* [github.com/mitchellh/golicense](https://github.com/mitchellh/golicense) for getting modules from binary | ||
* [github.com/uw-labs/lichen](https://github.com/uw-labs/lichen) for the vendored code to extract structured data from `go version -m` result. | ||
|
||
## Comparison with similar tools | ||
|
||
<!-- TODO(Bobgy): update this to a table --> | ||
|
||
* go-licenses/v2 was greatly inspired by [github.com/google/go-licenses](https://github.com/google/go-licenses), with the differences: | ||
* go-licenses/v2 works better with go modules. | ||
* no need to vendor dependencies. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should be careful about this - when dealing with reciprocal licenses it's often safer to vendor the dependencies to ensure compliance. See #28 for more discussion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the context. I believe the concern does not apply because I am specifically talking about using go vendor is not necessary, but when running the save command full source folder is still copied like go-licenses v1. |
||
* discovers versioned license URLs. | ||
* go-licenses/v2 scans all dependency files to find multiple licenses if any, while go-licenses detects by file name heuristics in local source folders and only finds one license per dependency. | ||
* go-licenses/v2 supports using a manually maintained config file `go-licenses.yaml`, so that we can reuse periodic license changes with existing information. | ||
* go-licenses/v2 was mostly written before I learned [github.com/github/licensed](https://github.com/github/licensed) is a thing. | ||
* Similar to google/go-licenses, github/licensed only use heuristics to find licenses and assumes one license per repo. | ||
* github/licensed uses a different library for detecting and classifying licenses. | ||
* go-licenses/v2 is a rewrite of [kubeflow/testing/go-license-tools](https://github.com/kubeflow/testing/tree/master/py/kubeflow/testing/go-license-tools) in go, with many improvements: | ||
* better & more robust github repo resolution ratio | ||
* better license classification rate using google/licenseclassifier/v2 (it especially handles BSD-2-Clause and BSD-3-Clause significantly better than GitHub license API). | ||
* automates licenses that require distributing source code with it (copied from local module src cache) | ||
* simpler process e2e (instead of too many intermediate steps and config files) | ||
* rewritten in go, so it's easier to redistribute the binary than python | ||
|
||
## Roadmap | ||
|
||
General directions to improve this tool: | ||
|
||
* Build backward compatible behavior compared to google/go-licenses v1. | ||
* Ask for more usage & feedback and improve robustness of the tool. | ||
|
||
## TODOs | ||
|
||
### Features | ||
|
||
#### P0 | ||
|
||
* [ ] Use cobra to support providing the same information via argument or config. | ||
* [ ] Implement "check" command | ||
|
||
#### P1 | ||
|
||
* [ ] Support installation using go get. | ||
* [ ] Support modules with +incompatible in their versions, ref: <https://golang.org/ref/mod#incompatible-versions>. | ||
* [ ] Refactor & improve test coverage. | ||
|
||
#### P2 | ||
|
||
* [ ] Support auto inclusion of licenses in headers by recording start line and end line of a license detection. | ||
* [ ] Find better default locations of generated files. | ||
* [ ] Improve logging format & consistency. | ||
* [ ] Tutorial for integration in CI/CD. | ||
|
||
## License Workflow Design Overview | ||
|
||
This section introduces full workflow to comply with open source licenses. | ||
In each workflow stage, we list several options and what this tool prefers. | ||
|
||
1. List dependencies - Options | ||
* (Preferred) List dependencies in a go binary | ||
* List all go module dependencies | ||
|
||
1. Detect licenses for a dependency | ||
* Files to consider - options: | ||
* (Preferred) Scan every file | ||
* Only look into common license file names like LICENSE, LICENSE.txt, COPYING, etc. | ||
* License classifier - options: | ||
* (Preferred) [google/licenseclassifier/v2](https://github.com/google/licenseclassifier/tree/main/v2) | ||
* [licensee](https://github.com/licensee/licensee) | ||
* GitHub license API | ||
* many other options | ||
* Manual configs to overcome what we cannot automate | ||
* (not supported yet) allowlist for licenses | ||
* (supported) override manually examined licenses | ||
* (supported) exclude self-owned proprietary dependencies | ||
* (supported) pin config to dependency version to avoid stale configs | ||
|
||
1. Comply with license requirements by redistributing: | ||
* attribution/copyright notice | ||
* licenses in full text | ||
* dependency source code for licenses that require so |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
// Copyright 2021 Google LLC | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package cmd | ||
|
||
const defaultLicenseDictLocation = "license_dict.csv" | ||
const defaultLicenseInfoLocation = "license_info.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend
go install
instead if you can. Something like google/licenseclassifier#38 might help with including license information as part of installation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1 to wait for upstream improvement
I also wanted to do this. Glad to see a PR has been sent on upstream that would enable this.