Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detailed description on genome scale metabolic model database #2

Open
Zhaoju-Deng opened this issue Apr 15, 2024 · 7 comments
Open

Comments

@Zhaoju-Deng
Copy link

Hi Diener,
would it possible to provide detailed description on the procedures that generated those GEM reference database? I found it quite useful while I am hesitating to use it in my manuscript without any description on how those GEM reference database were produced.

many thanks,
Zhaoju

@cdiener
Copy link
Contributor

cdiener commented Apr 15, 2024

Agreed, that would be good. I will work on adding some documentation.

Those are all built from Nextflow pipelines that are provided in the recipes folder and contain everything to build them from scratch. For now you could go through those to see what exactly is happening. For instance for AGORA2. After that they get uploaded with the release script to Zenodo.

@Zhaoju-Deng
Copy link
Author

thanks very much for your quick response! would it possible to provide a short description on the carveme reference database, I am trying to use it, since my microbiome data from cow, therefore agora2 database is not suitable for my analysis.

@cdiener
Copy link
Contributor

cdiener commented Apr 16, 2024

Those are just the models from the original CARVEME publication. Those were all bacteria in Refseq at the point. The only thing I did is go through the taxonomy IDs and update them to more recent versions with taxonkit (corresponding to the RefSeq release because the NCBI taxonomy itself does not really have releases). They are pretty old by this point (~5 years), so there might be more genomes those days. Alternatively you could build your own database either using carveme or gapseq. There are good genome catalogues for the rumen microbiome. I guess the medium would be another issue though.

@Zhaoju-Deng
Copy link
Author

many thanks for you instant reply! I always wondering in the database, only one GEM for one bacterial species, while in the NCBI genome RefSeq database, there are multiple reference (or representative) genomes corresponding to each bacterial species (https://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_refseq.txt), would it possible to share how the selection of the single one genome per bacterial species that used to construct GEM using Carveme was performed (or there exist a non-reduandant reference genome database?)? I can use Carveme to reconstruct the GEMs for each single genome per species. The samples are all fecal samples, the culture medium indeed is another issue, I am just thinking to simulate in the minimum media (and also in other media to test if results from different media differ significantly for those bacterial species of interests?), any other suggestions are also welcome!

@cdiener
Copy link
Contributor

cdiener commented Apr 16, 2024

The list of genomes can be found in the associated Github repo. There should be only one representative genome for each species in Refseq.

@Zhaoju-Deng
Copy link
Author

true, I previously contacted Dr. Machado specifically on how they mapped 16s amplicon sequences against the reference genome database (those were not non-reduandant reference genome database, one bacterial species had multiple reference/representative genomes, they used diamond algorithm to blast 16s amplicon sequence against to the reference genome database and used cutoff to filter the "best" match to 16s amplicon sequence, but Dr. Machado told me this part of analysis was done by Dr. Yongkyu Kim, however, I contacted with dr. Yongkyu Kim and also Prof. Kiran R. Patil but with no response) in their paper "Polarization of microbial communities between competitive and cooperative metabolism" to retrieve genomes for each bacterial species in each 16s microbiota sample. I followed their method, but the results should very low identity score and coverage% (I hardly had any samples with >97% identity score &95% coverage%,they used 97%identity and 95% coverage to filter best hits). that's why I consistantly asking if there exsit a reference database contains only single ref/representative genome for each bacterial species. the github repo you mentioned only contains 5587 models, while in the there are almost ~50k bacterial species in NCBI reference database, so I am trying to reconstruct GEMs by myself using CarveMe, but I was stuck at choosing the best genome for each species that could represent the bacterial species in 16s amplicon microbiota, it would be great if you have any suggestions? many thanks!

@cdiener
Copy link
Contributor

cdiener commented Apr 24, 2024

Oh yeah, what I meant is that there is usually (few exceptions) only one reference genome for one species in RefSeq (refseq_category column in the assembly summary). So if you would filter by this you would get something very close to single reference db.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants