-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the Support of Yeast #13
Comments
Hi @ypriverol I have checked for the two species you mentioned in your question. We can rewrite the gtf and fasta parser but the chromosome/plasmid IDs and additional specifications are required. Chromosome IDs (chromosome names such as 1,2,3,4,... in human) are very different in different species. I would require a list of all chromosome names for primary assembly and patches, haplotypes etc. (primary assembly highlighted). That is the most important requirement to address to enable PoGo to map for additional species. Additional specifications: GTF:
FASTA:
Currently PoGo supports GENCODE annotation. However, GENCODE does not follow the structure in the fasta file as described above. I will start the discussion with GENCODE to enable novel mapping for annotation purposes. It would be great if you could ask ENSEMBL for confirmation of the above specifications for all species in Ensembl Genomes, Ensembl Bacteria, Ensembl Protists, Ensembl Fungi, Ensembl Plants, Ensembl Metazoa and Ensembl (vertebrates). Also a full list of all primary assembly and patch/haplotype etc. chromosome names for all species in Ensembl and the sub Ensembl sites is required. |
Thanks @cschlaffner for your quick reply. In order to move this forward and also understand this better some questions here:
Can you out an example here?
This is not the case in GTF files for these species? Can you put an example? Regards |
Chromosome IDs in different species:
As for the GTF sctructure. I have seen that exon_id "exon_id" is variable and sometimes jumps from the CDS line to the exon line and vice versa specifically between Ensembl and GENCODE get files. Also I just need confirmation from Ensembl that the gene_id and transcript_id is used as described for all species in Ensembl without exception. If Ensembl ensures that structure, e.g. through their internal release code, then I do not have to download all GTF files and parse through all of them |
Hi @cschlaffner :
Can you explain in details why we can not do the mapping to taxonomies like Yeast or E.coli. This issue can be used to trigger the discussion with ENSEMBL and explain them the problem we are facing. We have more than 10 projects of Yeast we would like to be able to map to ENSEMBL.
The text was updated successfully, but these errors were encountered: