pathway-centrality

This program implements a module that calculates pathway-centrality scores for pre-defined pathway gene sets. Pathway centrality measures the amount of disease-specific communication passing through each pathway gene set, by counting the number of shortest paths between disease genes and differentially expressed genes. Significance of observed pathway-centrality scores for pathways are assessed via permutation tests using 10,000 pathway genes randomly selected from 2-core of the input network.

The program requires 5 arguments:

a file containing genes with known mutation associated with disease of interest (-d): i.e., sample_data/bpd.disease.genes.txt
a file containing genes differentially expressed within disease of interest (-e): i.e., sample_data/bpd.diff.exp.genes.txt
a file containing protein-protein interaction pairs (-p): i.e., sample_data/hippie_high_ppi.txt
a file containing pathway gene sets in .gmt file format (-g): i.e., sample_data/c2.cp.kegg.v6.0.entrez.gmt
output directory where all output files will be placed (-o): i.e., sample_data/output/

Genes should use exactly same identifications across all the input files. In our sample_data, genes are identified using Entrez Gene IDs.

The example command_line to run the program is: python PCmain.py -d sample_data/bpd.disease.genes.txt -e sample_data/bpd.diff.exp.genes.txt -p sample_data/hippie_high_ppi.txt -g sample_data/c2.cp.kegg.v6.0.entrez.gmt -o sample_data/output/

The program will create 11 files:

pc_disease_genes.txt: input disease genes, except those that also exist in differentially expressed gene set
pc_diff_exp_genes.txt: duplicated copy of input differentially expressed genes
pc_overlapping_genes.log: genes that exist in both disease gene set and differentially expressed gene set - these genes are removed from the disease gene set
pc_network_lcc.txt: protein-protein interaction pairs in the largest connected component of the given ppi networks
pc_disease_genes_not_in_lcc.log: diseaes genes that are not in 4), excluded from the experiment
pc_diff_exp_genes_not_in_lcc.log: differentially expressed genes that are not in 4), excluded from the experiment
pc_shortest_paths.txt: all possible shortest paths from input disease genes to differentially expressed genes in the largest connected component.
pc_pathway_genes.txt: input pathway gene sets, excluding disease genes and differentially expressed genes
pc_scores.txt: pathway centrality score calculated for all pathway gene sets
pc_p_cent.txt: p-value calculated for observed pathway centrality score for each pathway gene set using permutation tests
pc_p_cent.log: log file for permutation test, contains genes in the pool for random sampling and time records for progress

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
PCmodules		PCmodules
sample_data		sample_data
PCmain.py		PCmain.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pathway-centrality

About

Releases

Packages

Languages

TuftsBCB/pathway-centrality

Folders and files

Latest commit

History

Repository files navigation

pathway-centrality

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages