Carbohydrate Active eNzyme Domain analYsis tool (CANDy) - automated analysis of domain architectures in carbohydrate-active enzymes
CANDy boosts a fast, FAIR and seamless protein domain analysis of any CAZy family. An online version is available on Google Colab, yet for bigger families we recommend you downloading the Jupyter Notebook.
Make sure to have following tools installed in the same directory as the Jupyter Notebook:
Download the source code for CD-HIT from the GitHub repository at https://github.com/weizhongli/cdhit/releases and follow the installation instructions.
Precompiled binary can be downloaded here: https://mafft.cbrc.jp/alignment/software/. Change the installation directory to the path where the Notebook is stored or manually move the executable from the default directory. You can find the location of MAFFT by typing the following command in your terminal:
where mafft
Follow installation instructions for your system on http://www.microbesonline.org/fasttree/#Install
Or
Open a terminal window and install the Xcode Command Line Tools by typing the following command:
xcode-select --install
Install Homebrew by typing the following command:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Install FastTree by typing the following command:
brew install fasttree
Follow installation instructions for your system on http://www.microbesonline.org/fasttree/#Install
Or
Download the FastTree source code from the FastTree website at http://www.microbesonline.org/fasttree/.
Open a terminal window and navigate to the directory where you downloaded the source code.
Type the following commands to compile the FastTree package:
tar xvzf FastTree-2.1.13.c
cd FastTree-2.1.13
make
After the compilation process completes, you will find the FastTree executable file in the "FastTree-2.1.13" directory.
Download the FastTree Windows binary from the FastTree website at http://www.microbesonline.org/fasttree/#Download.
Extract the FastTree executable from the downloaded ZIP file.
Your directory shoud look like this:
This Notebook uses several Python packages. To avoid compatibility issues we recommend running this Notebook in a virtual environment.
- Therefore, install Anaconda and follow the installation instructions.
- Go to 'Environments' in Anaconda and click 'create'. Give your environment a name, for example 'myenv'. The virual environment will be launched automatically.
- Go to the package search bar and search for 'ipywidgets'. Download the package to be able to use the interactive widgets in this Notebook. Repeat for the 'h5py' package.
- Next, go back to the 'Home' page in Anaconda and install Jupyter Notebook. Once completed, press launch and go to the directory where you saved this Notebook.
- Verify that you see the name of the virtual enivronment on the right top of the Notebook, for example: Python (myenv). If that's not the case, go to Kernel and choose the environment.
Also, for large families, avoid your computer entering sleep or stand-by mode since this will interupt the run. Change the settings in your computer or caffeinate your system.
Install caffeinate package by running in your Terminal:
brew install caffeinate
Start the package by running:
caffeinate -d
Stop by running:
ctrl + C
Install caffeinate package by running in your Terminal:
sudo apt-get install caffeinate
Start the package by running:
caffeinate -d
Stop by running:
ctrl + C
Note: The caffeinate package is not available for Windows. However, you can use a similar feature called "powercfg" to prevent the system from going to sleep.
Open the Command Prompt application.
Type following line to see the current power requests:
powercfg /requests
Type following line, followed by the type of request you want to override (e.g., "system" or "display"):
powercfg /requestsoverride
To stop the power request override, type in "powercfg /requestsoverride" followed by the type of request and the "/remove" argument
When running the Google Colab version of CANDy, results containing the FATSA files, SQLite database, MSA, phylogenetic tree (in Newick format) and iTOL annotation files are automatically downloaded in a Zip file. When running CANDy locally, these outputs are stores in the same directory as the Jupyter Notebook.
To open the results in the database, download SQLite from: https://sqlitebrowser.org/
To view the phylogenetic tree, several free services are available. The Notebook makes use of the ete3 package to visualize the annotated tree in there. For a more interactive experience we recommend iTOL. The script outputs iTOL annotation files for the visualization of the protein domains and the activity of the included characterized sequences.
CANDy offers users a co-occurrence network that visually represents both the frequency of different domain types and the degree to which they are interconnected. A simple visualisation is offered in the Notebook, but for a more interactive experience we recommend using Cytoscape (yFiles Organic Layout).
CANDy communicates with and/or references the following separate libraries, packages and tools:
- Biopython
- pandas
- tqdm
- sqlitebrowser
- SQLAlchemy
- sdRDM
- requests
- ete3
- CD-HIT
- MAFFT
- FastTree
- NetworkX
- Matplotlib
This Jupyter Notebook is licensed under MIT.
This Notebook and other information provided is for theoretical utilisation only, caution should be exercised in its use. It is provided ‘as-is’ without any warranty of any kind, whether expressed or implied. Information is not intended to be a substitute for professional medical advice, diagnosis, or treatment, and does not constitute medical or other professional advice.
Use of the third-party software, libraries or code referred to in the Acknowledgements section in the CANDy README may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.
The following databases are used by CANDy, and are available with reference to the following:
- UniProt: (unmodified), by The UniProt Consortium, available under a Creative Commons Attribution-NoDerivatives 4.0 International License.
- NCBI: (unmodified), by the National Library of Medicine, available under a Creative Commons Attribution-NoDerivatives 4.0 International License.
- CAZy: (unmodified), by http://www.cazy.org/ and Elodie Drula, Marie-Line Garron, Suzan Dogan, Vincent Lombard, Bernard Henrissat, Nicolas Terrapon, The carbohydrate-active enzyme database: functions and literature, Nucleic Acids Research, Volume 50, Issue D1, 7 January 2022, Pages D571–D577, https://doi.org/10.1093/nar/gkab1045, available under a Creative Commons Attribution-NoDerivatives 4.0 International License.
- InterPro: (unmodified), by EMBL-EBI, available under a Creative Commons Attribution-NoDerivatives 4.0 International License.