ggmap is coded in Python 3.x
ggmap shall convert MetaPhlAn profiles into GreenGenes OTU based profiles.
- install miniconda3: https://docs.conda.io/en/latest/miniconda.html
- create a dedicated conda environment:
conda create --name ggmap
- activate new conda environment:
conda activate ggmap
- clone github repo via:
git clone https://github.com/sjanssen2/ggmap.git
- cd into the new directory
cd ggmap
- install modules from sources
python setup.py develop --user
- should the above command fail, you can alternatively try to install dependencies via conda like
conda install -c conda-forge `cat ci/conda_requirements.txt | grep -v '#' | cut -d ">" -f 1 | xargs`
and thereafter repeat the command of step 5. - I recommend to pip install the two statannot packages directly from their github repos as there are no conda packages available. See comments in
ci/conda_requirements.txt
- only for BCF System: you probably have to set the proxy to enable conda to speak to the internet:
export ftp_proxy="http://proxy.computational.bio.uni-giessen.de:3128" && export http_proxy="http://proxy.computational.bio.uni-giessen.de:3128" && export https_proxy="http://proxy.computational.bio.uni-giessen.de:3128"
- should the above command fail, you can alternatively try to install dependencies via conda like
- install neccessary additional conda packages
conda install ipykernel ipython_genutils
- make new kernel known to the hub:
python -m ipykernel install --user --name ggmap --display-name "ggmap"
After the first use, ggmap will create a file called .ggmaprc
in your home directory, (look at the content via cat $HOME/.ggmaprc
). Through this file, you can set some default to save typing in the python function calls like conda environment names.
- I assume you already installed qiime2 through miniconda (https://docs.qiime2.org/2021.8/install/). A
~/.ggmaprc
file will be generated the first time you load ggmap code in your python program / jupyter notebook. To do so, execute Challenge 1 below. Then come back and edit your~/.ggmaprc
to replace an potentially outdated qiime2 environment name with the one you installed (in our example 2021.8). There is a row starting withcondaenv_qiime2:
, replace the given name with your actual one. - If you are going to use a cluster to execute jobs (default), you need to create a directory:
mkdir $HOME/TMP
- ggmap needs to know the location of your miniconda3 prefix. This is typically located in $HOME/miniconda3. However, in the BCF system, we encuraged people to install it in the prefix $HOME/no_backup/miniconda3 (to avoid flooding our backup with millions of unimportant files). You need to adapt the
dir_conda:
entry in your~/.ggmaprc
file accordingly.
Create a new jupyter notebook with the ggmap kernel and type the following two lines in a cell:
from ggmap.snippets import *
from ggmap.analyses import *
If the cell produces output in a red box like, you managed to successfully load my code. Congratulations!
/homes/sjanssen/miniconda3/envs/notebookServer/lib/python3.7/site-packages/skbio/util/_testing.py:15: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing as pdt
ggmap is custome code from Stefan Janssen, download at https://github.com/sjanssen2/ggmap
Reading settings file '/homes/sjanssen/.ggmaprc'
Create a dummy feature table like
counts = pd.DataFrame([{'sample': "sample.A", 'bact1': 10, 'bact2': 7, 'bact3': 0},
{'sample': "sample.B", 'bact1': 5, 'bact2': 3, 'bact3': 8},
{'sample': "sample.C", 'bact1': 10, 'bact2': 0, 'bact3': 1}]).set_index('sample').T
Use this feature table to compute beta diversity distances through one of the wrapper functions of ggmap that internally call qiime2 methods:
res = beta_diversity(counts, metrics=['jaccard'], dry=False, use_grid=False, ppn=1)
Should it run through, you should "see" a result like the following when executing res['results']['jaccard']
in a new cell:
(Note that ppn=1
will cause the system to only use one CPU-core for the computation. For real data with more than three samples, you might want to increase this number.)
As above, but now we want to distribute computation as a cluster job via res = beta_diversity(counts, metrics=['jaccard'], dry=False, use_grid=True, nocache=True, ppn=1)
Result should be the same as above, but the system should submit the job to the SGE grid engine and poll every 10 seconds for the result. You might want to use another terminal and observe the job status via qstat
and/or look into the sub-directory $HOME/TMP/
. Don't forget to draw the results by repeating the second command from Challenge 2, i.e. res['results']['jaccard']
.
- You might encounter issues with conda environment activation if the job runs through the SGE cluster. This is likely due to the fact that an SGE job does not load information from your
~/.bashrc
. You can try to copy all conda relevant lines in your~/.bashrc
file (I will show mine below) and paste those into a new file names~/.bash_profile
:
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/homes/sjanssen/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/homes/sjanssen/miniconda3/etc/profile.d/conda.sh" ]; then
. "/homes/sjanssen/miniconda3/etc/profile.d/conda.sh"
else
export PATH="/homes/sjanssen/miniconda3/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
(carefull: you are not sjanssen
and your conda installation might be located in /homes/YOURNAME/no_backup/miniconda3
or somewhere else)
Good luck!