Carbohydrate Active eNzyme Domain analYsis tool (CANDy) - automated analysis of domain architectures in carbohydrate-active enzymes

CANDy boosts a fast, FAIR and seamless protein domain analysis of any CAZy family. An online version is available on Google Colab, yet for bigger families we recommend you downloading the Jupyter Notebook.

Requirements

Make sure to have following tools installed in the same directory as the Jupyter Notebook:

1. CD-HIT

Download the source code for CD-HIT from the GitHub repository at https://github.com/weizhongli/cdhit/releases and follow the installation instructions.

2. MAFFT

Precompiled binary can be downloaded here: https://mafft.cbrc.jp/alignment/software/. Change the installation directory to the path where the Notebook is stored or manually move the executable from the default directory. You can find the location of MAFFT by typing the following command in your terminal:

where mafft

3. FastTree

MacOS

Follow installation instructions for your system on http://www.microbesonline.org/fasttree/#Install

Or

Open a terminal window and install the Xcode Command Line Tools by typing the following command:

 xcode-select --install

Install Homebrew by typing the following command:

 /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install FastTree by typing the following command:

 brew install fasttree

Linux

Follow installation instructions for your system on http://www.microbesonline.org/fasttree/#Install

Or

Download the FastTree source code from the FastTree website at http://www.microbesonline.org/fasttree/.

Open a terminal window and navigate to the directory where you downloaded the source code.

Type the following commands to compile the FastTree package:

tar xvzf FastTree-2.1.13.c
cd FastTree-2.1.13
make

After the compilation process completes, you will find the FastTree executable file in the "FastTree-2.1.13" directory.

Windows

Download the FastTree Windows binary from the FastTree website at http://www.microbesonline.org/fasttree/#Download.

Extract the FastTree executable from the downloaded ZIP file.

Running CANDy

Your directory shoud look like this:

This Notebook uses several Python packages. To avoid compatibility issues we recommend running this Notebook in a virtual environment.

Therefore, install Anaconda and follow the installation instructions.
Go to 'Environments' in Anaconda and click 'create'. Give your environment a name, for example 'myenv'. The virual environment will be launched automatically.
Go to the package search bar and search for 'ipywidgets'. Download the package to be able to use the interactive widgets in this Notebook. Repeat for the 'h5py' package.
Next, go back to the 'Home' page in Anaconda and install Jupyter Notebook. Once completed, press launch and go to the directory where you saved this Notebook.
Verify that you see the name of the virtual enivronment on the right top of the Notebook, for example: Python (myenv). If that's not the case, go to Kernel and choose the environment.

Also, for large families, avoid your computer entering sleep or stand-by mode since this will interupt the run. Change the settings in your computer or caffeinate your system.

MacOS

Install caffeinate package by running in your Terminal:

brew install caffeinate

Start the package by running:

caffeinate -d

Stop by running:

ctrl + C

Linux

Install caffeinate package by running in your Terminal:

sudo apt-get install caffeinate

Start the package by running:

caffeinate -d

Stop by running:

ctrl + C

Windows

Note: The caffeinate package is not available for Windows. However, you can use a similar feature called "powercfg" to prevent the system from going to sleep.

Open the Command Prompt application.

Type following line to see the current power requests:

powercfg /requests

Type following line, followed by the type of request you want to override (e.g., "system" or "display"):

powercfg /requestsoverride

To stop the power request override, type in "powercfg /requestsoverride" followed by the type of request and the "/remove" argument

Output

When running the Google Colab version of CANDy, results containing the FATSA files, SQLite database, MSA, phylogenetic tree (in Newick format) and iTOL annotation files are automatically downloaded in a Zip file. When running CANDy locally, these outputs are stores in the same directory as the Jupyter Notebook.

Database

To open the results in the database, download SQLite from: https://sqlitebrowser.org/

(Annotated) Phylogenetic tree

To view the phylogenetic tree, several free services are available. The Notebook makes use of the ete3 package to visualize the annotated tree in there. For a more interactive experience we recommend iTOL. The script outputs iTOL annotation files for the visualization of the protein domains and the activity of the included characterized sequences.

Protein domain co-occurence network

CANDy offers users a co-occurrence network that visually represents both the frequency of different domain types and the degree to which they are interconnected. A simple visualisation is offered in the Notebook, but for a more interactive experience we recommend using Cytoscape (yFiles Organic Layout).

Acknowledgements

CANDy communicates with and/or references the following separate libraries, packages and tools:

Legal terms

License and Disclaimer

This Jupyter Notebook is licensed under MIT.

This Notebook and other information provided is for theoretical utilisation only, caution should be exercised in its use. It is provided ‘as-is’ without any warranty of any kind, whether expressed or implied. Information is not intended to be a substitute for professional medical advice, diagnosis, or treatment, and does not constitute medical or other professional advice.

Third-party software

Use of the third-party software, libraries or code referred to in the Acknowledgements section in the CANDy README may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

Databases

The following databases are used by CANDy, and are available with reference to the following:

UniProt: (unmodified), by The UniProt Consortium, available under a Creative Commons Attribution-NoDerivatives 4.0 International License.
NCBI: (unmodified), by the National Library of Medicine, available under a Creative Commons Attribution-NoDerivatives 4.0 International License.
CAZy: (unmodified), by http://www.cazy.org/ and Elodie Drula, Marie-Line Garron, Suzan Dogan, Vincent Lombard, Bernard Henrissat, Nicolas Terrapon, The carbohydrate-active enzyme database: functions and literature, Nucleic Acids Research, Volume 50, Issue D1, 7 January 2022, Pages D571–D577, https://doi.org/10.1093/nar/gkab1045, available under a Creative Commons Attribution-NoDerivatives 4.0 International License.
InterPro: (unmodified), by EMBL-EBI, available under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
CANDy v2.0.ipynb		CANDy v2.0.ipynb
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Carbohydrate Active eNzyme Domain analYsis tool (CANDy) - automated analysis of domain architectures in carbohydrate-active enzymes

Requirements

1. CD-HIT

2. MAFFT

3. FastTree

MacOS

Linux

Windows

Running CANDy

MacOS

Linux

Windows

Output

Database

(Annotated) Phylogenetic tree

Protein domain co-occurence network

Acknowledgements

Legal terms

License and Disclaimer

Third-party software

Databases

About

Releases

Packages

Languages

PyEED/CANDy

Folders and files

Latest commit

History

Repository files navigation

Carbohydrate Active eNzyme Domain analYsis tool (CANDy) - automated analysis of domain architectures in carbohydrate-active enzymes

Requirements

1. CD-HIT

2. MAFFT

3. FastTree

MacOS

Linux

Windows

Running CANDy

MacOS

Linux

Windows

Output

Database

(Annotated) Phylogenetic tree

Protein domain co-occurence network

Acknowledgements

Legal terms

License and Disclaimer

Third-party software

Databases

About

Resources

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages