Extract 3'UTR, 5'UTR, CDS, Promoter, Genes from Gencode files
- r>=3.2.1
- GenomicFeatures
./create_regions_from_gencode.R <path_to_GFF/GTF> <path_to_output_dir>
Will create exons.bed, 3UTR.bed, 5UTR.bed, genes.bed, cds.bed
in <output_dir>
- Download GFF/GTF(GRCh37, v25, comprehensive, CHR) from gencodegenes.org:
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/gencode.v25.annotation.gff3.gz \
&& gunzip gencode.v25.annotation.gff3.gz
- Create regions:
./create_regions_from_gencode.R gencode.v25.annotation.gff3 /path/to/GRCh37/annotation
We use GenePred
format to make the process a bit simple.
-
Download gtfToGenePred
-
Convert gtf to GenePred:
gtfToGenePred gencode.v25.annotation.gtf gencode.v25.annotation.genepred
-
Extract
first exons
:python genepred_to_bed.py --first_exon gencode.v25.annotation.genepred
-
Extract
last exons
:python genepred_to_bed.py --last_exon gencode.v25.annotation.genepred
or probably this: