Peptidase Annotation Summary Tool

This Python script summarizes peptidase annotation results from multiple species using data from the Hotpep_protease tool for my project.

Update 20241030

Added the -family parameter.

-family present: summary_statistics.csv will merge subfamily into their corresponding family rows, facilitating comparisons at the family level.
-family absent: summary_statistics.csv will include a mix of subfamily rows and some family rows (for families without subfamilies).

Input Data Structure

The test input data comes from Hotpep_protease(https://www.sciencedirect.com/science/article/pii/S2666952820300431), with each species' protein sequence files (longest deduplicated collections via CD-HIT) resulting in one output folder per species, obtained using Hotpep's default parameters (except viral).

The input consists of folders for this script for each species, each containing a peptidases directory with a summary.txt file. The summary.txt file must include the following columns:

Merops family: Classification of the peptidase.
proteins: Count of proteins associated with each Merops family.

The folder structure should be as follows:

input_path/
    species1/
        peptidases/
            summary.txt
    species2/
        peptidases/
            summary.txt
    ...

Usage

To run the script, use the following command:

python summarize_peptidase_results.py -in <input_path>

Replace <input_path> with the path to the folder containing all species results.

Output

The script generates two output files:

combined_summary.csv: Gene counts of each peptidase families/subfamilies for all species.
summary_statistics.csv: Total counts categorized by Merops family and subfamily, ensuring all categories are represented. Statistics by three categories:
- Total
- 9 catalytic types: Aspartic (A) Peptidases, Cysteine (C) Peptidases, Glutamic (G) Peptidases, Metallo (M) Peptidases, Asparagine (N) Peptide Lyases, Mixed (P) Peptidases, Serine (S) Peptidases, Threonine (T) Peptidases, Peptidases of Unknown (U) Catalytic Type
- Family

Requirements

Python 3.x
Pandas library

Install the required library using:

pip install pandas

Usage

python hotpep_statistics_multi_en.py -in /path/to/input_folder -family

This will process the data and output the results in the same directory.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
hotpep_statistics_multi_en.py		hotpep_statistics_multi_en.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Peptidase Annotation Summary Tool

Update 20241030

Input Data Structure

Usage

Output

Requirements

Usage

About

Releases

Packages

Languages

Rundon-svg/hotpep_protease_Sum

Folders and files

Latest commit

History

Repository files navigation

Peptidase Annotation Summary Tool

Update 20241030

Input Data Structure

Usage

Output

Requirements

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages