Skip to content
This repository has been archived by the owner on Mar 17, 2021. It is now read-only.

Check compatibility of license of each entry with the original dataset license #9

Open
tvercaut opened this issue Oct 14, 2018 · 5 comments

Comments

@tvercaut
Copy link
Member

As per #1 CC-BY is chosen as the default licence for the model zoo entries. However, this might not be compatible with the licence of the training dataset that was used to compute the weights.

OASIS for example has a permissive CC-BY licence (https://www.oasis-brains.org/#access) but has additional citation requirements which are currently not quite met in https://github.com/NifTK/NiftyNetModelZoo/tree/5-reorganising-with-lfs/OASIS

We need to check each entry individually.

  • What does the BRATS license say?
  • The VISCERAL paper mentions a "license agreement that assured the use of the data in its given environment and for its research purpose". We currently do not mention a non-commercial restriction
  • etc.
@wyli
Copy link
Member

wyli commented Oct 16, 2018

For OASIS there's an additional license file included in the .tar.gz;
for BRATS, it's a few volume extracted from the original set, I have contacted Spyros, he agreed that we host these volumes with a citation to the original papers.
I'll double check the other downloadables...

@tvercaut
Copy link
Member Author

Thanks. Note that it's not only about the data but also about the pre-trained weights as these might be considered derived work. Not 100% sure about it but would be worth looking into.

Re OASIS, for clarity, we could copy (or point to) the OASIS licence in a README file (in line with the discussion in #6 )

@fepegar
Copy link
Collaborator

fepegar commented Oct 16, 2018

@tvercaut, do you have any reference that explains what licenses are needed for machine learning models?

@tvercaut
Copy link
Member Author

tvercaut commented Oct 16, 2018

That is a complex question and in many cases might depend on the licences under which the training data was released. You will need someone with an actual law background to help navigate these questions I am afraid.

Even when the training data consists of photographs from say imagenet, flickr, etc. there are copyright questions. Whether pre-trained weights from there fall under "fair use" (not convinced but see see e.g. https://fairuse.stanford.edu/overview/fair-use/what-is-fair-use/) or whether they fall under "databases/fact compilations" (never really looked into these) or whether I am just fantasising (very plausible but I don't think this has been tested in court yet) is a great question. You will find many reddit and similar discussions on the topic, e.g.:

In short, we won't have a clear cut answer unless the licence in the original dataset helps us out...

@fepegar
Copy link
Collaborator

fepegar commented Oct 16, 2018

Thanks, Tom! I'll take a look.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants