-
Notifications
You must be signed in to change notification settings - Fork 41
Supporting additional species
Clipper uses an internal database which supports the following species, which are pre-parsed annotation files that are included upon installation:
- hg19
- GRCh38
- ce10
- dm3
- mm9
- mm10
If you are using Clipper with an unlisted assembly, I hope this page may serve as a guide to creating your own annotations. Prior to installation, you will need to add the <NEWSPECIES>.AS.STRUCTURE.COMPILED.gff
and to data and <NEWSPECIES>_genes.bed
+ <NEWSPECIES>_exons.bed
to data/regions/, respectively. <NEWSPECIES>
will be the name of your species, which you will specify when running clipper (ie. clipper --species hg18
would correspond to new annotations hg18.AS.STRUCTURE.COMPILED.gff
, hg18_genes.bed
, and hg18_exons.bed
). Below is an example for one entry DDX3X
This BED file contains genomic coordinates for each gene whose name column will be the geneID and should match the gene identifiers in the other two required files:
chrX 41333283 41364472 ENSG00000215301.10 0 +
This BED file contains exon coordinates for each gene whose name column will be the geneID and should match the gene identifiers in the other two required files. Overlapping exons among distinct transcripts should be merged to generate a non-overlapping list of representative exons per gene:
chrX 41333283 41334297 ENSG00000215301.10 0 +
chrX 41334590 41336738 ENSG00000215301.10 0 +
chrX 41337407 41339604 ENSG00000215301.10 0 +
chrX 41339909 41344128 ENSG00000215301.10 0 +
chrX 41344190 41344564 ENSG00000215301.10 0 +
chrX 41345179 41345548 ENSG00000215301.10 0 +
chrX 41346228 41346622 ENSG00000215301.10 0 +
chrX 41346858 41351668 ENSG00000215301.10 0 +
chrX 41357832 41358000 ENSG00000215301.10 0 +
chrX 41364273 41364472 ENSG00000215301.10 0 +
This file is a gff-formatted file modified to provide an ID, mRNA length (length of the gene) and pre mRNA length (sum of associated exon lengths) for each gene. This file can be generated using the genes and exons file from above:
chrX AS_STRUCTURE gene 41333284 41364472 . + . gene_id=ENSG00000215301.10;mrna_length=15892;premrna_length=31189
Once these files are in their respective directories, re-install Clipper and the new species should be available to use.
This helpful tool has been kindly developed by Vishal Koparde (https://github.com/kopardev) to autogenerate Clipper reference data:
https://github.com/kopardev/clipperhelper
Alternatively, the create_region_bedfiles
script has been developed to perform GTF -> BED/AS.STRUCTURE transformation, which may be used as references for Clipper: