Question about the Support of Yeast #13

ypriverol · 2017-10-30T09:38:50Z

Can you explain in details why we can not do the mapping to taxonomies like Yeast or E.coli. This issue can be used to trigger the discussion with ENSEMBL and explain them the problem we are facing. We have more than 10 projects of Yeast we would like to be able to map to ENSEMBL.

cschlaffner · 2017-10-31T20:19:34Z

Hi @ypriverol

I have checked for the two species you mentioned in your question. We can rewrite the gtf and fasta parser but the chromosome/plasmid IDs and additional specifications are required.

Chromosome IDs (chromosome names such as 1,2,3,4,... in human) are very different in different species. I would require a list of all chromosome names for primary assembly and patches, haplotypes etc. (primary assembly highlighted). That is the most important requirement to address to enable PoGo to map for additional species.

Additional specifications:

GTF:

gene line holds gene_id in the description column as - gene_id "gene_id";
transcript line holds gene_id and transcript_id in the description column as - gene_id "gene_id"; transcript_id "transcript_id";
CDS line holds gene_id, transcript_id, and exon_id in the description column as - gene_id "gene_id"; transcript_id "transcript_id"; exon_id "exon_id";

FASTA:

every fasta header contains gene_id and transcript_id as - gene:gene_id transcript:transcript_id

Currently PoGo supports GENCODE annotation. However, GENCODE does not follow the structure in the fasta file as described above. I will start the discussion with GENCODE to enable novel mapping for annotation purposes.

It would be great if you could ask ENSEMBL for confirmation of the above specifications for all species in Ensembl Genomes, Ensembl Bacteria, Ensembl Protists, Ensembl Fungi, Ensembl Plants, Ensembl Metazoa and Ensembl (vertebrates). Also a full list of all primary assembly and patch/haplotype etc. chromosome names for all species in Ensembl and the sub Ensembl sites is required.

ypriverol · 2017-11-01T10:03:39Z

Thanks @cschlaffner for your quick reply. In order to move this forward and also understand this better some questions here:

Chromosome IDs (chromosome names such as 1,2,3,4,... in human) are very different in different species.

Can you out an example here?

gene line holds gene_id in the description column as - gene_id "gene_id";

transcript line holds gene_id and transcript_id in the description column as - gene_id "gene_id"; transcript_id "transcript_id";

CDS line holds gene_id, transcript_id, and exon_id in the description column as - gene_id "gene_id"; transcript_id "transcript_id"; exon_id "exon_id";

This is not the case in GTF files for these species? Can you put an example?

Regards
Yasset

cschlaffner · 2017-11-07T20:02:05Z

@ypriverol

Chromosome IDs in different species:

Human: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, KI270713.1, KI270711.1, GL000195.1, GL000219.1, GL000216.2, ...
Yeast: I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, XIII, XIV, XV, XVI, Mito
E.coli: Chromosome, pHUSEC2011-1, pHUSEC2011-2, pHUSEC2011-3

gene line holds gene_id in the description column as - gene_id "gene_id";

transcript line holds gene_id and transcript_id in the description column as - gene_id "gene_id"; transcript_id "transcript_id";

CDS line holds gene_id, transcript_id, and exon_id in the description column as - gene_id "gene_id"; transcript_id "transcript_id"; exon_id "exon_id";

As for the GTF sctructure. I have seen that exon_id "exon_id" is variable and sometimes jumps from the CDS line to the exon line and vice versa specifically between Ensembl and GENCODE get files.

Also I just need confirmation from Ensembl that the gene_id and transcript_id is used as described for all species in Ensembl without exception. If Ensembl ensures that structure, e.g. through their internal release code, then I do not have to download all GTF files and parse through all of them

ypriverol added the question label Oct 30, 2017

beaferbl mentioned this issue Jul 31, 2023

Using PoGo with Stringtie GTF #19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the Support of Yeast #13

Question about the Support of Yeast #13

ypriverol commented Oct 30, 2017 •

edited

Loading

cschlaffner commented Oct 31, 2017

ypriverol commented Nov 1, 2017

cschlaffner commented Nov 7, 2017

Question about the Support of Yeast #13

Question about the Support of Yeast #13

Comments

ypriverol commented Oct 30, 2017 • edited Loading

cschlaffner commented Oct 31, 2017

ypriverol commented Nov 1, 2017

cschlaffner commented Nov 7, 2017

ypriverol commented Oct 30, 2017 •

edited

Loading