NOTE: UNDER CONSTRUCTION, the actual software is NOT released yet
seek for circular RNA in transcriptome (identifies deferentially expressed circRNAs between two samples)
Version: v1.0.0.b
Last Modified: 2018-10-29
Authors: Bioinformatics Lab, University of Louisville, Kentucky Biomedical Research Infrastructure Network (KBRIN)
Different Junction Counts (linear junction counts and circular junction counts) used in the circular RNA expression level estimation.
Percent Back-spliced In (PBI) calculation (circular junction counts with respect to all junction counts)
circular RNA derived from exon 2 of Cdp gene, Rat, Hippocampus, Somata and Neuropil. This circular RNA is highly expressed in Neuropil (31.9%) than Somata (11.9%).
- STAR Aligner: v2.5.2b
1 Download seekCRIT
git clone https://github.com/UofLBioinformatics/seekCRIT.git
cd seekCRIT
2 Install required packages (Some prerequisite packages might require admin access rights. Please contact your system admin to install such packages.)
pip3 install -r Prerequisites.txt
3 Install seekCRIT
python setup.py install
4 testing seekCRIT with testrun.sh
In order to run testrun.sh:
- Dowload genome sequence in FASTA format for Rattus_norvegicus genome ( It can be downloaded from UCSC Genome Browser)
- Dowload gtf annotation for Rattus_norvegicus genome.
- Run testrun.sh to test the installation and the dependencies of seekCRIT and also it will test the installation with CRTL12.fastq and IR12.fastq by specifying the path for FASTA and gtf files:
./testrun.sh gtf/Rattus_norvegicus.Ensembl.rn6.r84.gtf fasta/rn6.fa
usage: seekCRIT.py [-h] -s1 S1 -s2 S2 -gtf GTF -o OUTDIR -t {SE,PE}
--genomeIndex GENOMEINDEX -fa FASTA -ref REFSEQ
[--threadNumber numThreads]
[--aligner aligner]
[--deltaPSI DELTAPSI] [--highConfidence HIGHCONFIDENCE]
[--libType {fr-unstranded,fr-firststrand,fr-secondstrand}]
[--keepTemp {Y,N}]
Identifying and Characterizing Differentially Spliced circular RNAs between
two samples
Required arguments:
===================
-s1 S1, --sample1 S1 fastq files for sample_1. Replicates are separated by
comma. Paired-end reads are separated by colon.
e.g.,s1-1.fastq,s1-2.fastq for single-end read. s1-1.R
1.fastq:s1-1.R2.fastq,s1-2.R1.fastq:s1-2.R2.fastq for
single-end read
-s2 S2, --sample2 S2 fastq files for sample_2. Replicates are separated by
comma. Paired-end reads are separated by colon.
e.g.,s2-1.fastq,s2-2.fastq for single-end read. s2-1.R
1.fastq:s2-1.R2.fastq,s2-2.R1.fastq:s2-2.R2.fastq for
single-end read
-gtf GTF, --gtf GTF The gtf annotation file. e.g., hg38.gtf
-o OUTDIR, --output OUTDIR
Output directory
-t {SE,PE}, --readType {SE,PE}
Read type. SE for Single-end read, PE for Paired-end read
--genomeIndex GENOMEINDEX
Genome indexes for the aligner
-fa FASTA, --fasta FASTA
Genome sequence. e.g., hg38.fa
-ref REFSEQ, --refseq REFSEQ
Transcriptome in refseq format. e.g., hg38.ref.txt
optional arguments:
====================
-h, --help show this help message and exit
--threadNumber numberOfThreadsk
Number of threads for multi-threading feature [default = 4]
--aligner aligner aligner to use(for now it supports only STAR but we are working on it to support more aligners)
--deltaPSI DELTAPSI Delta PSI cutoff. i.e., significant event must show
bigger deltaPSI than this cutoff [default = 0.05]
--highConfidence HIGHCONFIDENCE
Minimum number of circular junction counts required [default = 1]
--libType {fr-unstranded,fr-firststrand,fr-secondstrand}
library type used by Tophat aligner [default ='fr-unstranded']
--keepTemp {Y,N} Keep temp files or not [default='Y']
python3 seekCRIT.py -o PEtest -t PE -fa fa/hg19.fa -ref ref/hg19.ref.txt --genomeIndex /media/bio/data/STARIndex/hg19 -s1 testData/231ESRP.25K.rep-1.R1.fastq:testData/231ESRP.25K.rep-1.R2.fastq,testData/231ESRP.25K.rep-2.R1.fastq:testData/231ESRP.25K.rep-2.R2.fastq -s2 testData/231EV.25K.rep-1.R1.fastq:testData/231EV.25K.rep-1.R2.fastq,testData/231EV.25K.rep-2.R1.fastq:testData/231EV.25K.rep-2.R2.fastq -gtf testData/test.gtf --threadNumber 12
python3 seekCRIT.py -o SEtest -t SE -fa fa/hg19.fa -ref ref/hg19.ref.txt --genomeIndex /media/bio/data/STARIndex/hg19 -s1 testData/231ESRP.25K.rep-1.R1.fastq,testData/231ESRP.25K.rep-1.R2.fastq,testData/231ESRP.25K.rep-2.R1.fastq,testData/231ESRP.25K.rep-2.R2.fastq -s2 testData/231EV.25K.rep-1.R1.fastq,testData/231EV.25K.rep-1.R2.fastq,testData/231EV.25K.rep-2.R1.fastq,testData/231EV.25K.rep-2.R2.fastq -gtf testData/test.gtf --threadNumber 12
- Transcriptome should be in refseq format below (see more details in the example ):
Field | Description |
---|---|
geneName | Name of gene |
isoform_name | name of isoform |
chrom | chromosme |
strand | strand (+/-) |
txStart | Transcription start position |
txEnd | Transcription end position |
cdsStart | Coding region end |
exonCount | Number of exons |
exonStarts | Exon start positions |
exonEnds | Exon end positions |
- It is not obligatory to provide REFSEQ file, we made script (GTFtoREFSEQ) to convert from gtf to refseq that is used in the main code if no refseq file is provided.
See details in the example file
Field | Description |
---|---|
chrom | chromosome |
circRNA_start | circular RNA 5' end position |
circRNA_end | circular RNA 3' end position |
strand | DNA strand (+/-) |
exonCount | number of exons included in the circular RNA transcript |
exonSizes | size of exons included in the circular RNA transcript |
exonOffsets | offsets of exons included in the circular RNA transcript |
circType | circRNA, ciRNA, ccRNA |
geneName | name of gene |
isoformName | name of isoform |
exonIndexOrIntronIndex | Index (start from 1) of exon (for circRNA) or intron (for ciRNA) in given isoform |
FlankingIntrons | Left intron/Right intron |
CircularJunctionCount_Sample_1 | read count of the circular junction in sample # 1 |
LinearJunctionCount_Sample_1 | read count of the linear junction in sample # 1 |
CircularJunctionCount_Sample_2 | read count of the circular junction in sample # 2 |
LinearJunctionCount_Sample_2 | read count of the linear junction in sample # 1 |
PBI_Sample_1 | Percent Backsplicing Index for sample # 1 |
PBI_Sample_2 | Percent Backsplicing Index for sample # 2 |
deltaPBI(PBI_1-PBI_2) | difference between PBI values of two samples |
pValue | pValue |
FDR | FDR |
To calculate the significane of differentially expressed circular RNAs, use the criteria:
-
At least 5% changes in percent back-spliced in (PBI) or |deltaPBI|>=5%
-
FDR < 0.05
Copyright (C) 2017 . See the LICENSE file for license rights and limitations (MIT).