Skip to content

Commit

Permalink
Update v0.5.0 (#19)
Browse files Browse the repository at this point in the history
* Update v0.5.0

* Fix vignette typo.
  • Loading branch information
zmccaw-insitro authored Apr 4, 2024
1 parent b71de44 commit 48f18ac
Show file tree
Hide file tree
Showing 16 changed files with 253 additions and 39 deletions.
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: AllelicSeries
Title: Allelic Series Test
Version: 0.0.4.1
Version: 0.0.5.0
Authors@R:
c(person(given = "Zachary",
family = "McCaw",
Expand All @@ -14,7 +14,7 @@ Authors@R:
comment = c(ORCID = "0000-0002-1748-625X")),
person(given = "insitro", role = c("cph"))
)
Description: Implementation of gene-level rare variant association tests targeting allelic series: genes where increasingly deleterious mutations have increasingly large phenotypic effects. The COding-variant Allelic Series Test (COAST) operates on the benign missense variants (BMVs), deleterious missense variants (DMVs), and protein truncating variants (PTVs) within a gene. COAST uses a set of adjustable weights that tailor the test towards rejecting the null hypothesis for genes where the average magnitude of effect increases monotonically from BMVs to DMVs to PTVs. See McCaw ZR, O’Dushlaine C, Somineni H, Bereket M, Klein C, Karaletsos T, Casale FP, Koller D, Soare TW. (2022) "An allelic series rare variant association test for candidate gene discovery" <doi:10.1101/2022.12.23.521658>.
Description: Implementation of gene-level rare variant association tests targeting allelic series: genes where increasingly deleterious mutations have increasingly large phenotypic effects. The COding-variant Allelic Series Test (COAST) operates on the benign missense variants (BMVs), deleterious missense variants (DMVs), and protein truncating variants (PTVs) within a gene. COAST uses a set of adjustable weights that tailor the test towards rejecting the null hypothesis for genes where the average magnitude of effect increases monotonically from BMVs to DMVs to PTVs. See McCaw ZR, O’Dushlaine C, Somineni H, Bereket M, Klein C, Karaletsos T, Casale FP, Koller D, Soare TW. (2023) "An allelic series rare variant association test for candidate gene discovery" <doi:10.1016/j.ajhg.2023.07.001>.
License: BSD_3_clause + file LICENSE
Encoding: UTF-8
Imports:
Expand All @@ -25,7 +25,7 @@ LinkingTo:
Rcpp,
RcppArmadillo
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.0
RoxygenNote: 7.2.3
Suggests:
knitr,
rmarkdown,
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ export(ASKAT)
export(Aggregator)
export(COAST)
export(Comparator)
export(CountAlleles)
export(DGP)
export(OLS)
importFrom(Rcpp,sourceCpp)
Expand Down
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
## Version 0.5.0

* Added option (`min_mac`) to filter the variant set to only include those variants having at least a minimum minor allele count (10 is recommended).
* Added a function (`CountAlleles`) to count the number of alleles of each variant category present in the genotype matrix. Also allows for counting the number of carriers of each type of allele.
* By default, `COAST` now reports the number of alleles of each variant category that contributed to the test.

39 changes: 38 additions & 1 deletion R/allelic_series_test.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Purpose: Allelic series test.
# Updated: 2023-10-04
# Updated: 2024-04-03

# Default weights.
DEFAULT_WEIGHTS <- c(1, 2, 3)
Expand All @@ -14,6 +14,7 @@ DEFAULT_WEIGHTS <- c(1, 2, 3)
#' @param indicator Convert raw counts to indicators? Default: FALSE.
#' @param method Method for aggregating across categories:
#' {"none", "max", "sum"}. Default: "none".
#' @param min_mac Minimum minor allele count for inclusion. Default: 0.
#' @param weights Annotation category weights.
#' @return (n x 3) Numeric matrix without weighting, (n x 1) numeric matrix
#' with weighting.
Expand All @@ -24,8 +25,18 @@ Aggregator <- function(
drop_empty = TRUE,
indicator = FALSE,
method = "none",
min_mac = 0,
weights = DEFAULT_WEIGHTS
) {

# Minor allele count filtering.
if (min_mac > 0) {
mac <- apply(geno, 2, sum)
keep <- (mac >= min_mac)

anno <- anno[keep]
geno <- geno[, keep, drop = FALSE]
}

# Sum to categories.
bmv <- apply(geno[, anno == 0, drop = FALSE], 1, sum)
Expand Down Expand Up @@ -95,6 +106,7 @@ Aggregator <- function(
#' @param is_pheno_binary Is the phenotype binary? Default: FALSE.
#' @param method Method for aggregating across categories: {"none", "max",
#' "sum"}. Default: "none".
#' @param min_mac Minimum minor allele count for inclusion. Default: 0.
#' @param score_test Run a score test? If FALSE, performs a Wald test.
#' @param weights (3 x 1) annotation category weights.
#' @return Numeric p-value.
Expand All @@ -120,6 +132,7 @@ ASBT <- function(
indicator = FALSE,
is_pheno_binary = FALSE,
method = "none",
min_mac = 0,
score_test = FALSE,
weights = DEFAULT_WEIGHTS
) {
Expand Down Expand Up @@ -152,6 +165,7 @@ ASBT <- function(
drop_empty = TRUE,
indicator = indicator,
method = method,
min_mac = min_mac,
weights = weights
)

Expand Down Expand Up @@ -191,6 +205,7 @@ ASBT <- function(
#' Default: TRUE. Ignored if phenotype is binary.
#' @param covar (n x p) covariate matrix. Defaults to an (n x 1) intercept.
#' @param is_pheno_binary Is the phenotype binary? Default: FALSE.
#' @param min_mac Minimum minor allele count for inclusion. Default: 0.
#' @param return_null_model Return the null model in addition to the p-value?
#' Useful if running additional SKAT tests. Default: FALSE.
#' @param weights (3 x 1) annotation category weights.
Expand All @@ -216,6 +231,7 @@ ASKAT <- function(
apply_int = TRUE,
covar = NULL,
is_pheno_binary = FALSE,
min_mac = 0,
return_null_model = FALSE,
weights = DEFAULT_WEIGHTS
) {
Expand All @@ -241,6 +257,15 @@ ASKAT <- function(
weights = weights
)

# Minor allele count filtering.
if (min_mac >= 0) {
mac <- apply(geno, 2, sum)
keep <- (mac > min_mac)

anno <- anno[keep]
geno <- geno[, keep, drop = FALSE]
}

# Alternate allele frequencies.
aaf <- apply(geno, 2, mean) / 2

Expand Down Expand Up @@ -320,6 +345,8 @@ ASKAT <- function(
#' @param include_orig_skato_ptv Include the original version of SKAT-O applied
#' to PTV variants only in the omnibus test? Default: FALSE.
#' @param is_pheno_binary Is the phenotype binary? Default: FALSE.
#' @param min_mac Minimum minor allele count for inclusion. Default: 0.
#' @param return_counts Include minor allele counts in output? Default: TRUE.
#' @param return_omni_only Return only the omnibus p-value? Default: FALSE.
#' @param score_test Use a score test for burden analysis? If FALSE, uses a
#' Wald test.
Expand Down Expand Up @@ -348,6 +375,8 @@ COAST <- function(
include_orig_skato_all = FALSE,
include_orig_skato_ptv = FALSE,
is_pheno_binary = FALSE,
min_mac = 0,
return_counts = TRUE,
return_omni_only = FALSE,
score_test = FALSE,
weights = DEFAULT_WEIGHTS
Expand Down Expand Up @@ -381,6 +410,7 @@ COAST <- function(
geno = geno,
pheno = pheno,
apply_int = apply_int,
min_mac = min_mac,
is_pheno_binary = is_pheno_binary,
score_test = score_test,
...
Expand Down Expand Up @@ -427,6 +457,7 @@ COAST <- function(
geno = geno,
pheno = pheno,
is_pheno_binary = is_pheno_binary,
min_mac = min_mac,
return_null_model = TRUE,
weights = weights
)
Expand Down Expand Up @@ -463,5 +494,11 @@ COAST <- function(
} else {
out <- c(p_val, p_omni = p_omni)
}

if (return_counts) {
counts <- CountAlleles(anno = anno, geno = geno, min_mac = min_mac)
out <- c(counts, out)
}

return(out)
}
51 changes: 50 additions & 1 deletion R/utilities.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,54 @@
# Purpose: Utility functions.
# Updated: 2023-04-25
# Updated: 2024-04-03

#' Count Alleles
#'
#' Count the number of non-zero alleles bearing each variant annotation.
#'
#' @param anno (snps x 1) annotation vector with values in c(0, 1, 2).
#' @param geno (n x snps) genotype matrix.
#' @param count_carriers If true, counts the number of carriers rather than
#' the number of alleles.
#' @param min_mac Minimum minor allele count for inclusion. Default: 0.
#' @return 3 x 1 numeric vector with the counts of BMVs, DMVs, and PTVs.
#' @export
CountAlleles <- function(
anno,
geno,
count_carriers = FALSE,
min_mac = 0
) {

# Minor allele count filtering.
if (min_mac > 0) {
mac <- apply(geno, 2, sum)
keep <- (mac >= min_mac)

anno <- anno[keep]
geno <- geno[, keep, drop = FALSE]
}

# Category counts.
if (count_carriers) {

# Count carriers.
n_bmv <- sum(apply(geno[, anno == 0, drop = FALSE], 1, sum) > 0)
n_dmv <- sum(apply(geno[, anno == 1, drop = FALSE], 1, sum) > 0)
n_ptv <- sum(apply(geno[, anno == 2, drop = FALSE], 1, sum) > 0)

} else {

# Count alleles.
n_bmv <- sum(geno[, anno == 0])
n_dmv <- sum(geno[, anno == 1])
n_ptv <- sum(geno[, anno == 2])

}

# Output.
out <- c(n_bmv = n_bmv, n_dmv = n_dmv, n_ptv = n_ptv)
return(out)
}


#' Linear Association Test
Expand Down
69 changes: 48 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
# Allelic Series

Implementation of gene-level rare variant association tests targeting allelic series: genes where increasingly deleterious mutations have increasingly large phenotypic effects. The COding-variant Allelic Series Test (COAST) operates on the benign missense variants (BMVs), deleterious missense variants (DMVs), and protein truncating variants (PTVs) within a gene. COAST uses a set of adjustable weights that tailor the test towards rejecting the null hypothesis for genes where the average magnitude of effect increases monotonically from BMVs to DMVs to PTVs. See McCaw ZR, O’Dushlaine C, Somineni H, Bereket M, Klein C, Karaletsos T, Casale FP, Koller D, Soare TW. (2023) "An allelic-series rare-variant association test for candidate-gene discovery" [doi:10.1016/j.ajhg.2023.07.001](https://www.cell.com/ajhg/fulltext/S0002-9297(23)00241-0).
This package implements gene-level rare variant association tests
targeting allelic series: genes where increasingly deleterious mutations
have increasingly large phenotypic effects. The main COding-variant
Allelic Series Test (COAST) operates on the benign missense variants
(BMVs), deleterious missense variants (DMVs), and protein truncating
variants (PTVs) within a gene. COAST uses a set of adjustable weights
that tailor the test towards rejecting the null for genes where the
average magnitude of phenotypic effect increases monotonically from BMVs
to DMVs to PTVs. Such genes are of candidate therapeutic interest due to
the existence of a dose-response relationship between gene functionality
and phenotypic impact. See McCaw ZR, O’Dushlaine C, Somineni H, Bereket
M, Klein C, Karaletsos T, Casale FP, Koller D, Soare TW. (2023) “An
allelic-series rare-variant association test for candidate-gene
discovery”
[doi:10.1016/j.ajhg.2023.07.001](https://www.cell.com/ajhg/fulltext/S0002-9297(23)00241-0).

# Installation

Expand Down Expand Up @@ -28,11 +42,13 @@ The example `data` are a list with the following components:

- `anno`: An `snps` by 1 annotation vector coded as 0 for benign
missense variants (BMVs), 1 for deleterious missense variants (DMVs),
and 2 for protein truncating variants (PTVs).
and 2 for protein truncating variants (PTVs). Note that the values of
(0, 1, 2) simply identify different categories of variants; `weights`
other than these can be set when performing the association test.

- `covar`: An `n` by 6 covariate matrix including an intercept `int`,
and covariates representing `age`, `sex`, and 3 genetic PCs (`pc1`
`pc3`).
and covariates representing `age`, `sex`, and 3 genetic PCs (`pc1`,
`pc2`, `pc3`).

- `geno`: An `n` by `snps` genotype matrix with additive coding and
minor allele frequencies between 0.5% and 1.0%.
Expand Down Expand Up @@ -67,24 +83,29 @@ genotype matrix, and the phenotype vector.
included manually, if desired.

- `weights` encodes the relative importance of BMVs, DMVs, and PTVs. The
example weights of `c(1, 2, 3)` target a genetic architecture where effect sizes increase with increasing deleteriousness:
BMVs have an effect of 1, DMVs have an effect of 2, and PTVs have an effect of 3. Weights of
`c(1, 1, 1)` target instead a genetic architecture where all variant
types have equivalent expected magnitudes.
example weights of `c(1, 2, 3)` target a genetic architecture where
effect sizes increase with increasing deleteriousness: BMVs have an
effect of 1, DMVs have an effect of 2, and PTVs have an effect of 3.
Weights of `c(1, 1, 1)` target instead a genetic architecture where
all variant types have equivalent expected magnitudes.

``` r
show(results)
```

## p_count p_ind p_max_count p_max_ind p_sum_count
## 7.707024e-29 6.745269e-06 4.299938e-13 6.228669e-07 6.756953e-18
## p_sum_ind p_allelic_skat p_omni
## 1.894500e-06 8.507977e-08 9.248429e-28
## n_bmv n_dmv n_ptv p_count p_ind
## 2.870000e+02 1.620000e+02 6.100000e+01 3.112702e-26 1.322084e-09
## p_max_count p_max_ind p_sum_count p_sum_ind p_allelic_skat
## 3.076876e-10 5.374363e-09 1.661854e-20 2.554417e-11 2.658137e-07
## p_omni
## 3.735235e-25

By default, the output of `COAST` is a vector of p-values, corresponding
to the different components of the allelic series test and the overall
omnibus test (`p_omni`). To return the omnibus p-value only, specify
`return_omni_only = TRUE` when calling `COAST`.
By default, the output of `COAST` includes counts for the number of
alleles of each variant class that contributed to the test, and a vector
of p-values, corresponding to the different components of the allelic
series test. The final, overall p-value is given by `p_omni`. To return
the omnibus p-value only, specify `return_omni_only = TRUE` when calling
`COAST`.

## Robust omnibus test

Expand Down Expand Up @@ -112,11 +133,17 @@ results <- COAST(
show(results)
```

## p_count p_ind p_max_count p_max_ind p_sum_count
## 7.707024e-29 6.745269e-06 4.299938e-13 6.228669e-07 6.756953e-18
## p_sum_ind p_allelic_skat p_orig_skat_all p_orig_skat_ptv p_omni
## 1.894500e-06 8.507977e-08 1.250426e-05 2.237988e-13 9.248429e-28
## n_bmv n_dmv n_ptv p_count p_ind
## 2.870000e+02 1.620000e+02 6.100000e+01 3.112702e-26 1.322084e-09
## p_max_count p_max_ind p_sum_count p_sum_ind p_allelic_skat
## 3.076876e-10 5.374363e-09 1.661854e-20 2.554417e-11 2.658137e-07
## p_orig_skat_all p_orig_skat_ptv p_omni
## 1.548324e-05 6.632119e-08 3.735235e-25

## Loading genotypes

The [genio](https://CRAN.R-project.org/package=genio) and [rbgen](https://enkre.net/cgi-bin/code/bgen/wiki?name=rbgen) packages may be used to load PLINK and BGEN genotypes in R, respectively. Moreover, [PLINK](https://www.cog-genomics.org/plink/2.0/) enables conversion between the file types.
The [genio](https://CRAN.R-project.org/package=genio) and
[rbgen](https://enkre.net/cgi-bin/code/bgen/wiki?name=rbgen) packages
may be used to load PLINK and BGEN genotypes in R, respectively.
Moreover, [PLINK](https://www.cog-genomics.org/plink/2.0/) enables
conversion between these file types.
3 changes: 3 additions & 0 deletions man/ASBT.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions man/ASKAT.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions man/Aggregator.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/AllelicSeries-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 48f18ac

Please sign in to comment.