Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pberghei_eg_gene geneset have all genes missing as compared to PlasmoDB #110

Open
Rohit-Satyam opened this issue Aug 17, 2024 · 6 comments
Assignees

Comments

@Rohit-Satyam
Copy link

Rohit-Satyam commented Aug 17, 2024

I made a strange observation. When I see the PlasmoDB gene Database, I see that there are 5254 genes but when using Biomart, I get only 4903 genes. Various characterized genes such as "PBANKA_0100600", "PBANKA_0102900" are missing from it while various genes such as PBANKA_000970, PBANKA_000980 that are absent from PlasmoDB and Uniprot are present. I am using biomaRt 2.60.1. Can this be fixed?

mart = "protists_mart"
gset = "pberghei_eg_gene"
 ensembl_mart <- biomaRt::useEnsemblGenomes(biomart = mart, dataset = gset)
gene_names <- biomaRt::getBM(attributes = "ensembl_gene_id", mart = ensembl_mart)

Various weird gene IDs appear in place of Ensemble gene Ids as well in first 65 rows.

image

The version is also missing and only PBANKA01 is written.

> ensembl_mart@version
[1] ""

Edit1: I just realised none of the gene ID maps

jVenn_chart (3)

@Rohit-Satyam Rohit-Satyam changed the title pberghei_eg_gene genesets have many essential genes missing as compared to PlasmoDB pberghei_eg_gene genesets have all genes missing as compared to PlasmoDB Aug 18, 2024
@Rohit-Satyam Rohit-Satyam changed the title pberghei_eg_gene genesets have all genes missing as compared to PlasmoDB pberghei_eg_gene geneset have all genes missing as compared to PlasmoDB Aug 18, 2024
@Rohit-Satyam
Copy link
Author

@grimbough Can you please take this as a priority since I have heard in parasite meetings and the disclaimer on the website that the database might cease to exist after 14th September!!

@Rohit-Satyam
Copy link
Author

@jwokaty @dtenenba @vobencha anyone?

@grimbough
Copy link
Owner

The biomaRt package is just an interface to the data hosted the Ensembl BioMart service. I have no control over the content of that service. There's some information on the assembly and genome build for this organism used by Ensembl at https://protists.ensembl.org/Plasmodium_berghei/Info/Annotation/ My guess would be that this is outdated compared to the version provided by PlasmoDB.

If you think this is an issue, the best place to contact is the Ensembl Helpdesk at https://protists.ensembl.org/Help/Contact They should be able to provide more information on how genome builds are choose and whether there is an update path for this specific organism.

@grimbough grimbough self-assigned this Aug 26, 2024
@Rohit-Satyam
Copy link
Author

Thanks ka @grimbough. I was under the impression that you guys are friends with Ensembl. I will write to them now.

@grimbough
Copy link
Owner

I work for the same organisation as the Ensembl team, but we're in different departments and different countries, and mostly interact with the folks who maintain the Ensembl BioMart instance. I don't hold any influence over the choice of data or genomes that get included in Ensembl.

@Rohit-Satyam
Copy link
Author

I emailed the ensemble @grimbough but didn't get any reply. What should I do now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants