If you use results from this tool, please cite
Coelho, L.P., Alves, R., del Río, Á.R. et al. Towards the biogeography of prokaryotic genes. Nature 601, 252–256 (2022). [https://doi.org/10.1038/s41586-021-04233-4](DOI: 10.1038/s41586-021-04233-4)
Command line tool to query the Global Microbial Gene Catalog (GMGC).
GMGC-mapper runs on Python 3.6-3.10 and requires prodigal to be available for genome mode.
The easiest way to install GMGC-mapper is through bioconda, which will ensure
all dependencies (including prodigal
) are installed automatically:
conda install -c bioconda gmgc-mapper
Alternatively, GMGC-mapper
is available from PyPI, so can be installed
through pip:
pip install GMGC-mapper
Note that this does not install prodigal
(which is necessary for the
genome-based workflow).
Finally, especially if you are retrieving the cutting edge version from Github, you can install with the standard
python setup.py install
- Input is a genome sequence.
gmgc-mapper -i input.fasta -o output
- Input is DNA/protein gene sequences
gmgc-mapper --nt-genes genes.fna --aa-genes genes.faa -o output
The nucleotide input is optional (but should be used if available so that the quality of the hits can be refined):
gmgc-mapper --aa-genes genes.faa -o output
If yout input is a metagenome, you can use NGLess for assembly and gene prediction. For more details, read the docs.
The output folder will contain
- Outputs of gene prediction (prodigal).
- Complete data table, listing all the hits in GMGC, per gene.
- Complete table, listing all the genome bins (MAGs) that are found in the results.
- Human readable summary.
For more details, read the docs. A description of the outputs is also written to output folder for convenience.
-
-i/--input
: path to the input genome file (FASTA, possibly .gz/.bz2/.xz compressed). -
-o/--output
: Output directory (will be created if non-existent). -
--nt-genes
: path to the input DNA gene file (FASTA, possibly .gz/.bz2/.xz compressed). -
--aa-genes
: path to the input Protein gene file (FASTA, possibly .gz/.bz2/.xz compressed).