-
Notifications
You must be signed in to change notification settings - Fork 12
/
README.Rmd
196 lines (140 loc) · 7.2 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# UCSCXenaTools <img src='man/figures/logo.png' align="right" height="200" alt="logo"/>
<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/UCSCXenaTools)](https://cran.r-project.org/package=UCSCXenaTools)
[![lifecycle](https://img.shields.io/badge/lifecycle-stable-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html)
[![R-CMD-check](https://github.com/ropensci/UCSCXenaTools/actions/workflows/main.yml/badge.svg)](https://github.com/ropensci/UCSCXenaTools/actions/workflows/main.yml)
[![](https://cranlogs.r-pkg.org/badges/grand-total/UCSCXenaTools?color=orange)](https://cran.r-project.org/package=UCSCXenaTools)
[![rOpenSci](https://badges.ropensci.org/315_status.svg)](https://github.com/ropensci/software-review/issues/315)
[![DOI](https://joss.theoj.org/papers/10.21105/joss.01627/status.svg)](https://doi.org/10.21105/joss.01627)
<!-- badges: end -->
**UCSCXenaTools** is an R package for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq.
Public omics data from UCSC Xena are supported through [**multiple turn-key Xena Hubs**](https://xenabrowser.net/datapages/), which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded.
**Who is the target audience and what are scientific applications of this package?**
* Target Audience: cancer and clinical researchers, bioinformaticians
* Applications: genomic and clinical analyses
## Table of Contents
* [Installation](#installation)
* [Data Hub List](#data-hub-list)
* [Basic usage](#basic-usage)
* [Citation](#citation)
* [How to contribute](#how-to-contribute)
* [Acknowledgment](#acknowledgment)
## Installation
Install stable release from r-universe/CRAN with:
```{r, eval=FALSE}
install.packages('UCSCXenaTools', repos = c('https://ropensci.r-universe.dev', 'https://cloud.r-project.org'))
#install.packages("UCSCXenaTools")
```
You can also install devel version of **UCSCXenaTools** from github with:
```{r gh-installation, eval = FALSE}
# install.packages("remotes")
remotes::install_github("ropensci/UCSCXenaTools")
```
If you want to build vignette in local, please add two options:
```{r, eval=FALSE}
remotes::install_github("ropensci/UCSCXenaTools", build_vignettes = TRUE, dependencies = TRUE)
```
## Data Hub List
All datasets are available at <https://xenabrowser.net/datapages/>.
Currently, **UCSCXenaTools** supports the following data hubs of UCSC Xena.
* UCSC Public Hub: <https://ucscpublic.xenahubs.net/>
* TCGA Hub: <https://tcga.xenahubs.net/>
* GDC Xena Hub (new): <https://gdc.xenahubs.net/>
* GDC v18.0 Xena Hub (old): <https://gdcV18.xenahubs.net/>
* ICGC Xena Hub: <https://icgc.xenahubs.net/>
* Pan-Cancer Atlas Hub: <https://pancanatlas.xenahubs.net/>
* UCSC Toil RNAseq Recompute Compendium Hub: <https://toil.xenahubs.net/>
* PCAWG Xena Hub: <https://pcawg.xenahubs.net/>
* ATAC-seq Hub: <https://atacseq.xenahubs.net/>
* Singel Cell Xena Hub: <https://singlecellnew.xenahubs.net/> (**Disabled by UCSCXena**)
* Kids First Xena Hub: <https://kidsfirst.xenahubs.net/>
* Treehouse Xena Hub: <https://xena.treehouse.gi.ucsc.edu:443/>
Users can update dataset list from the newest version of UCSC Xena by hand with `XenaDataUpdate()` function, followed
by restarting R and `library(UCSCXenaTools)`.
If any url of data hub is changed or a new data hub is online, please remind me by emailing to <[email protected]> or [opening an issue on GitHub](https://github.com/ropensci/UCSCXenaTools/issues).
## Basic usage
Download UCSC Xena datasets and load them into R by **UCSCXenaTools** is a workflow with `generate`, `filter`, `query`, `download` and `prepare` 5 steps, which are implemented as `XenaGenerate`, `XenaFilter`, `XenaQuery`, `XenaDownload` and `XenaPrepare` functions, respectively. They are very clear and easy to use and combine with other packages like `dplyr`.
To show the basic usage of **UCSCXenaTools**, we will download clinical data of LUNG, LUAD, LUSC from TCGA (hg19 version) data hub. Users can learn more about **UCSCXenaTools** by running `browseVignettes("UCSCXenaTools")` to read vignette.
### XenaData data.frame
**UCSCXenaTools** uses a `data.frame` object (built in package) `XenaData` to generate an instance of `XenaHub` class, which records information of all datasets of UCSC Xena Data Hubs.
You can load `XenaData` after loading `UCSCXenaTools` into R.
```{r}
library(UCSCXenaTools)
data(XenaData)
head(XenaData)
```
### Workflow
Select datasets.
```{r}
# The options in XenaFilter function support Regular Expression
XenaGenerate(subset = XenaHostNames=="tcgaHub") %>%
XenaFilter(filterDatasets = "clinical") %>%
XenaFilter(filterDatasets = "LUAD|LUSC|LUNG") -> df_todo
df_todo
```
Query and download.
```{r}
XenaQuery(df_todo) %>%
XenaDownload() -> xe_download
```
Prepare data into R for analysis.
```{r}
cli = XenaPrepare(xe_download)
class(cli)
names(cli)
```
## More to read
- [Introduction and basic usage of UCSCXenaTools](https://shixiangwang.github.io/home/en/tools/ucscxenatools-intro/)
- [UCSCXenaTools: Retrieve Gene Expression and Clinical Information from UCSC Xena for Survival Analysis](https://shixiangwang.github.io/home/en/post/ucscxenatools-201908/)
- [Obtain RNAseq Values for a Specific Gene in Xena Database](https://shixiangwang.github.io/home/en/post/2020-07-22-ucscxenatools-single-gene/)
- [UCSC Xena Access APIs in UCSCXenaTools](https://shixiangwang.github.io/home/en/tools/ucscxenatools-api/)
## Citation
Cite me by the following paper.
```
Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data
from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq.
Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627
# For BibTex
@article{Wang2019UCSCXenaTools,
journal = {Journal of Open Source Software},
doi = {10.21105/joss.01627},
issn = {2475-9066},
number = {40},
publisher = {The Open Journal},
title = {The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq},
url = {https://dx.doi.org/10.21105/joss.01627},
volume = {4},
author = {Wang, Shixiang and Liu, Xuesong},
pages = {1627},
date = {2019-08-05},
year = {2019},
month = {8},
day = {5},
}
```
Cite UCSC Xena by the following paper.
```
Goldman, Mary, et al. "The UCSC Xena Platform for cancer genomics data
visualization and interpretation." BioRxiv (2019): 326470.
```
## How to contribute
For anyone who wants to contribute, please follow the guideline:
* Clone project from GitHub
* Open `UCSCXenaTools.Rproj` with RStudio
* Modify source code
* Run `devtools::check()`, and fix all errors, warnings and notes
* Create a pull request
## Acknowledgment
This package is based on [XenaR](https://github.com/mtmorgan/XenaR), thanks [Martin Morgan](https://github.com/mtmorgan) for his work.
[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)