Carrier statistic is a statistical framework to prioritize disease-related rare variants by integrating gene expression data.
Rscript step1_carrier_stat.R \
--genotype=GENOTYPE_PREFIX \
--variants=VARIANTS_PREFIX \
--rna=RNA_PREFIX \
--gene=GENE_FILE \
--variants_gene_pair=VARIANTS_GENE_PAIR_FILE \
--outfile=OUTFILE_PREFIX
where the inputs are
GENOTYPE_PREFIX
(required): The prefix for genotype files. This prefix should correspond toGENOTYPE_PREFIX_case.txt
for case group andGENOTYPE_PREFIX_ctrl.txt
for control group.VARIANTS_PREFIX
(required): The prefix for variant information files accompanying the genotype files. This prefix should correspond toVARIANTS_PREFIX_case.txt
for case group andVARIANTS_PREFIX_ctrl.txt
for control group.RNA_PREFIX
(required): The prefix for gene expression data files. This prefix should correspond toRNA_PREFIX_case.txt
for case group andRNA_PREFIX_ctrl.txt
for control group.GENE_FILE
(required): The full path to the gene information file accompanying the gene expression data files.VARIANTS_GENE_PAIR_FILE
(required): The full path to the variant-gene pair information file.OUTFILE_PREFIX
(required): The prefix for output carrier statistic files. Two files will be generated,OUTFILE_PREFIX_case.txt
for case group andOUTFILE_PREFIX_ctrl.txt
for control group.
cd carrier-stat
Rscript ./step1_carrier_stat.R \
--genotype=./example/genotype \
--variants=./example/variants \
--rna=./example/rna \
--gene=./example/gene.txt \
--variants_gene_pair=./example/variants_gene_pair.txt \
--outfile=./example/carrier_stat
Genotype file (GENOTYPE_PREFIX_case.txt
and GENOTYPE_PREFIX_ctrl.txt
): Allelic dosage file (number of ALT alleles, only 0/1/2 are supported) without a header line, one row per sample and one column per variant. The number of columns (i.e., the number of variants) must be equal to the number of rows in the variant information file. The number of rows (i.e., the number of samples) must be equal to the number of columns in the gene expression data file.
Variant information file (VARIANTS_PREFIX_case.txt
and VARIANTS_PREFIX_case.txt
): A text file with a header line (CHROM: chromosome; POS: position; ID: variant name; REF: reference allele; ALT: alternative allele). The number of rows (i.e., the number of variants) must be equal to the number of columns in the genotype file.
Gene expression data file (RNA_PREFIX_case.txt
and RNA_PREFIX_ctrl.txt
): RNA reads count file without a header line, one row per gene and one column per sample. The number of columns (i.e., the number of samples) must be equal to the number of rows in the genotype file. The number of rows (i.e., the number of genes) must be equal to the number of rows in the gene information file.
Gene information file (GENE_FILE
): A text file with a header line (CHROM: chromosome; MINBP: start position of the gene; MAXBP: end position of the gene; GENE: gene name). The number of rows (i.e., the number of genes) must be equal to the number of rows in the gene expression data file.
Variant-gene pair information file (VARIANTS_GENE_PAIR_FILE
): A text file with a header line (CHROM: chromosome; POS: position; ID: variant name; REF: reference allele; ALT: alternative allele; GENE: gene name).
Output carrier statistic file (OUTFILE_PREFIX_case.txt
and OUTFILE_PREFIX_ctrl.txt
): A text file with a header line (CHROM: chromosome; POS: position; ID: variant name; REF: reference allele; ALT: alternative allele; GENE: gene name; n_carrier: number of samples carrying the variant; carrier_stat: carrier statistic value).
Rscript step2_analysis.R \
--carrier_stat=CARRIER_STAT_PREFIX \
--outfile=OUTFILE_PREFIX \
--fdr_thre=FDR_THRESHOLD
where the inputs are
CARRIER_STAT_PREFIX
(required): The prefix for carrier statistic files output from Step 1. This prefix should correspond toCARRIER_STAT_PREFIX_case.txt
for case group andCARRIER_STAT_PREFIX_ctrl.txt
for control group.OUTFILE_PREFIX
(required): The prefix for files containing significant variant-gene pairs. Two files will be generated,OUTFILE_PREFIX_downregulated_fdr_FDR_THRESHOLD.txt
for significant variant-gene pairs with negative carrier statistics andOUTFILE_PREFIX_upregulated_fdr_FDR_THRESHOLD.txt
for significant variant-gene pairs with positive carrier statistics.FDR_THRESHOLD
(optional): FDR cutoff. Default is 0.05.
cd carrier-stat
Rscript ./step2_analysis.R \
--carrier_stat=./example/carrier_stat \
--outfile=./example/carrier_stat \
--fdr_thre=0.2
Output files containing significant variant-gene pairs (OUTFILE_PREFIX_downregulated_fdr_FDR_THRESHOLD.txt
and OUTFILE_PREFIX_upregulated_fdr_FDR_THRESHOLD.txt
): A text file with a header line (CHROM: chromosome; POS: position; ID: variant name; REF: reference allele; ALT: alternative allele; GENE: gene name; n_carrier: number of samples carrying the variant; carrier_stat: carrier statistic value; fdr: FDR).