GMGC-mapper

CITATION

If you use results from this tool, please cite

Coelho, L.P., Alves, R., del Río, Á.R. et al. Towards the biogeography of prokaryotic genes. Nature 601, 252–256 (2022). [https://doi.org/10.1038/s41586-021-04233-4](DOI: 10.1038/s41586-021-04233-4)

Command line tool to query the Global Microbial Gene Catalog (GMGC).

Install

GMGC-mapper runs on Python 3.6-3.10 and requires prodigal to be available for genome mode.

Conda install

The easiest way to install GMGC-mapper is through bioconda, which will ensure all dependencies (including prodigal) are installed automatically:

conda install -c bioconda gmgc-mapper

pip install

Alternatively, GMGC-mapper is available from PyPI, so can be installed through pip:

pip install GMGC-mapper

Note that this does not install prodigal (which is necessary for the genome-based workflow).

Install from source

Finally, especially if you are retrieving the cutting edge version from Github, you can install with the standard

python setup.py install

Examples

Input is a genome sequence.

gmgc-mapper -i input.fasta -o output

Input is DNA/protein gene sequences

gmgc-mapper --nt-genes genes.fna --aa-genes genes.faa -o output

The nucleotide input is optional (but should be used if available so that the quality of the hits can be refined):

gmgc-mapper --aa-genes genes.faa -o output

If yout input is a metagenome, you can use NGLess for assembly and gene prediction. For more details, read the docs.

Output

The output folder will contain

Outputs of gene prediction (prodigal).
Complete data table, listing all the hits in GMGC, per gene.
Complete table, listing all the genome bins (MAGs) that are found in the results.
Human readable summary.

For more details, read the docs. A description of the outputs is also written to output folder for convenience.

Parameters

-i/--input: path to the input genome file (FASTA, possibly .gz/.bz2/.xz compressed).
-o/--output: Output directory (will be created if non-existent).
--nt-genes: path to the input DNA gene file (FASTA, possibly .gz/.bz2/.xz compressed).
--aa-genes: path to the input Protein gene file (FASTA, possibly .gz/.bz2/.xz compressed).

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.github/workflows		.github/workflows
docs		docs
gmgc_mapper		gmgc_mapper
test		test
.gitignore		.gitignore
ChangeLog		ChangeLog
MANIFEST.in		MANIFEST.in
README.md		README.md
mkdocs.yml		mkdocs.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GMGC-mapper

CITATION

Install

Conda install

pip install

Install from source

Examples

Output

Parameters

About

Releases 2

Packages

Contributors 3

Languages

BigDataBiology/GMGC-mapper

Folders and files

Latest commit

History

Repository files navigation

GMGC-mapper

CITATION

Install

Conda install

pip install

Install from source

Examples

Output

Parameters

About

Topics

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages