-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Much Richer Sample Clinical Data #38
Comments
Wow |
Not sure but perhaps @ChristopherWilks will know? we could figshare it |
@gwaygenomics here are the following attributes I care about:
|
Good to see Snaptron and/or its data could be useful for you folks. As far as maintaining, the project as a whole will be maintained in terms of the expression and junction data for various data sources (SRA, GTEx, TCGA). I'll defer to @nellore and @lcolladotor regarding updates to the metadata, specifically TCGA (not surprisingly, this was the hardest metadata to integrate and there was a non-trivial effort put into it). |
thanks for the quick response @ChristopherWilks - yes, I certainly understand that collecting that metadata was no small feat! I also agree with those 3 points @dhimmel raises. Would be great to incorporate this data to cognoma if possible! |
Hi, Sorry for the delay. We had to discuss internally some of your concerns with @jtleek, @nellore and others. Basically, our answer is no, we won't use a CC0 license due to the lack of attribution. We are providing support to https://jhubiostatistics.shinyapps.io/recount/ according to the needs of that project. So we'll most likely fix issues with the data in So basically, you can use the TCGA metadata we cleaned at your own risk if you cite our work. Currently the manuscript is available as a pre-print but it is in press right now and will appear published soon. Best, PS The related GitHub repositories to
There's also the |
@lcolladotor thanks for the detailed information. The links are really helpful, and it's great to see that all your work is on GitHub.
Very understandable. Have you considered releasing your data under a Creative Commons Attribution (CC BY) or Open Data Commons Attribution (ODC-BY) License? Both of these meet the Open Definition and allow reuse as long as attribution is provided. As it stands without a license, it is potentially copyright infringement to reuse your datasets. The situation is complicated, since the data is likely not subject to copyright in United States, but may be subject to copyright in Europe. Additionally, fair use may protect reuse, but that varies by jurisdiction and is a subjective & non-deterministic judgement. Therefore, an open license can help others know with greater certainty what reuse is permissible. I know most researchers aren't excited to divert precious time to understanding legalities. But I bring it up because with a little effort up front, I think we can save a lot of headaches down the road and help advance the data sciences. |
Hi, Thank you for your detailed reply. We have decided to use the CC BY 4.0 license for the data in the In more detail:
In relation to the TCGA metadata, the code for creating it is in https://github.com/leekgroup/recount-website. The resulting table is downloadable via https://jhubiostatistics.shinyapps.io/recount/. Specifically via http://duffel.rail.bio/recount/TCGA/TCGA.tsv. An R formatted version can be downloaded with: ## Install if needed
source("https://bioconductor.org/biocLite.R")
biocLite('recount')
library('recount')
metadata <- all_metadata('TCGA')
## Some quick info about it
> dim(metadata)
[1] 11284 864 Best, |
@lcolladotor thanks! I know licensing is a hassle. Really appreciate that you took the time to unambiguously and openly license recount2. |
Stumbled upon snaptron today and eventually found my way to this resource.
There are many variables curated here measured on each sample (in
samples.tsv
) including treatment (both specific therapeutic agent, and class of therapy (e.g. chemotherapy, immunotherapy, etc.). I know that @yigalron was very interested in this particular data...The text was updated successfully, but these errors were encountered: