detailed description on genome scale metabolic model database #2

Zhaoju-Deng · 2024-04-15T14:39:33Z

Hi Diener,
would it possible to provide detailed description on the procedures that generated those GEM reference database? I found it quite useful while I am hesitating to use it in my manuscript without any description on how those GEM reference database were produced.

many thanks,
Zhaoju

cdiener · 2024-04-15T14:47:42Z

Agreed, that would be good. I will work on adding some documentation.

Those are all built from Nextflow pipelines that are provided in the recipes folder and contain everything to build them from scratch. For now you could go through those to see what exactly is happening. For instance for AGORA2. After that they get uploaded with the release script to Zenodo.

Zhaoju-Deng · 2024-04-15T14:51:15Z

thanks very much for your quick response! would it possible to provide a short description on the carveme reference database, I am trying to use it, since my microbiome data from cow, therefore agora2 database is not suitable for my analysis.

cdiener · 2024-04-16T14:09:19Z

Those are just the models from the original CARVEME publication. Those were all bacteria in Refseq at the point. The only thing I did is go through the taxonomy IDs and update them to more recent versions with taxonkit (corresponding to the RefSeq release because the NCBI taxonomy itself does not really have releases). They are pretty old by this point (~5 years), so there might be more genomes those days. Alternatively you could build your own database either using carveme or gapseq. There are good genome catalogues for the rumen microbiome. I guess the medium would be another issue though.

Zhaoju-Deng · 2024-04-16T14:24:25Z

many thanks for you instant reply! I always wondering in the database, only one GEM for one bacterial species, while in the NCBI genome RefSeq database, there are multiple reference (or representative) genomes corresponding to each bacterial species (https://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_refseq.txt), would it possible to share how the selection of the single one genome per bacterial species that used to construct GEM using Carveme was performed (or there exist a non-reduandant reference genome database?)? I can use Carveme to reconstruct the GEMs for each single genome per species. The samples are all fecal samples, the culture medium indeed is another issue, I am just thinking to simulate in the minimum media (and also in other media to test if results from different media differ significantly for those bacterial species of interests?), any other suggestions are also welcome!

cdiener · 2024-04-16T15:56:48Z

The list of genomes can be found in the associated Github repo. There should be only one representative genome for each species in Refseq.

Zhaoju-Deng · 2024-04-17T04:30:49Z

true, I previously contacted Dr. Machado specifically on how they mapped 16s amplicon sequences against the reference genome database (those were not non-reduandant reference genome database, one bacterial species had multiple reference/representative genomes, they used diamond algorithm to blast 16s amplicon sequence against to the reference genome database and used cutoff to filter the "best" match to 16s amplicon sequence, but Dr. Machado told me this part of analysis was done by Dr. Yongkyu Kim, however, I contacted with dr. Yongkyu Kim and also Prof. Kiran R. Patil but with no response) in their paper "Polarization of microbial communities between competitive and cooperative metabolism" to retrieve genomes for each bacterial species in each 16s microbiota sample. I followed their method, but the results should very low identity score and coverage% (I hardly had any samples with >97% identity score &95% coverage%,they used 97%identity and 95% coverage to filter best hits). that's why I consistantly asking if there exsit a reference database contains only single ref/representative genome for each bacterial species. the github repo you mentioned only contains 5587 models, while in the there are almost ~50k bacterial species in NCBI reference database, so I am trying to reconstruct GEMs by myself using CarveMe, but I was stuck at choosing the best genome for each species that could represent the bacterial species in 16s amplicon microbiota, it would be great if you have any suggestions? many thanks!

cdiener · 2024-04-24T07:27:58Z

Oh yeah, what I meant is that there is usually (few exceptions) only one reference genome for one species in RefSeq (refseq_category column in the assembly summary). So if you would filter by this you would get something very close to single reference db.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detailed description on genome scale metabolic model database #2

detailed description on genome scale metabolic model database #2

Zhaoju-Deng commented Apr 15, 2024

cdiener commented Apr 15, 2024 •

edited

Loading

Zhaoju-Deng commented Apr 15, 2024

cdiener commented Apr 16, 2024

Zhaoju-Deng commented Apr 16, 2024

cdiener commented Apr 16, 2024

Zhaoju-Deng commented Apr 17, 2024

cdiener commented Apr 24, 2024

detailed description on genome scale metabolic model database #2

detailed description on genome scale metabolic model database #2

Comments

Zhaoju-Deng commented Apr 15, 2024

cdiener commented Apr 15, 2024 • edited Loading

Zhaoju-Deng commented Apr 15, 2024

cdiener commented Apr 16, 2024

Zhaoju-Deng commented Apr 16, 2024

cdiener commented Apr 16, 2024

Zhaoju-Deng commented Apr 17, 2024

cdiener commented Apr 24, 2024

cdiener commented Apr 15, 2024 •

edited

Loading