-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENSEMBL ID version conversion #24
Comments
@lgeistlinger Good idea on the new issue.
Lines 201 to 219 in 4357b80
Lines 272 to 290 in 4357b80
|
Here's my fix. If you are happy with it, then I will create a PR. Other solutions might involve:
|
Thanks. Can you provide an example where the mapping results in such versioned ENSEMBL gene ids? If that's caused by outdated mappings in the corresponding org.db package, then it is worth fixing it directly there instead of working around it downstream. |
I was removing the ENSEMBL versioning information in my commit before doing the mapping with AnnotationDBI. I don't think that the org.db packages use the versioning information (which is the issue), but I could be wrong. Is that what you mean? For me, the version info is introduced way before my R pipeline. For this instance specifically, I was using salmon/gencode for quantification. |
I see we are talking here about providing versioned IDs to the ID mapping. Well, although I can see that this might be handy to have, I think in this case, it's best to leave it up to the user to provide valid (here: unversioned) gene IDs that are compatible with mapping via |
Ok, thanks for the response @lgeistlinger. When I have some extra time, I will get some feedback from the AnnotationDBI repository, and link back to this issue. |
It might be even worth understanding why your GENCODE reference would include versioned gene IDs in the first place? |
You got me curious @lgeistlinger . I definitely had to google some of this so let me know if you have some insight. ENSEMBL ids contain a version (ENS***.Version), so that when things change......
......the older references can be preserved. GENCODE is a project to create super accurate mouse/human genetic data from ENSEMBL. So they should have the versioning info. My question is why doesn't the OrgDbs contain the versioning information? Is it just because OrgDbs primarily map to the Entrez Ids? |
I think it reflects the scope of the two different applications (read mapping vs gene ID mapping). For read mapping, different versions of a gene ID can result in updates to the genomic coordinates / chromosomal location of the gene (eg when a novel transcript is annotated to the gene). This, in turn, can result also in a different read count for that gene, with eg more reads falling onto the updated coordinates. For gene ID mapping, however, the version does not matter, as, when eg mapping from ENSEMBL IDs to gene symbols, ENSG00000002919 maps to SNX11, and thus so does ENSG00000002919.1, ENSG00000002919.2, ..., ENSG00000002919.14. Therefore AnnotationDbi also doesn't care about the versions. At least this is how I understand it. |
Consider adding in functionality for
EnrichmentBrowser::idMap
so that it automatically validates/converts ENSEMBL ids fromid.version
toid
(e.g.ENSG00000002919.14
toENSG00000002919
). Try to conserveid.version
by adding another column to rowData. This is really more of an issue with AnnotationDBI, but it couldn't hurt.Originally posted by @grabearummc in #23 (comment)
The text was updated successfully, but these errors were encountered: