-
Notifications
You must be signed in to change notification settings - Fork 16
Enrichment with g:GOSt
Below is more of an explanation, reference and self-reminder of what parameters are used in the enrichment app and how this affects what pathways are displayed. Some of this information on g:Profiler web service is undocumented, difficult to discern from the help docs or cobbled together through various sources (Contact desk).
Under the hood, the enrichment app wraps g:GOSt to find pathways from a gene query list.
Parameters sf_GO:BP
and sf_REAC
are boolean flags that select Gene Ontology Biological Process and Reactome pathways, respectively, for inclusion in enrichment analysis. Parameter no_iea
is boolean for inclusion of GO assignments 'Inferred from Electronic Annotation'.
If you are interested in including other collections, here are the following (undocumented, via help desk) flags:
sf_GO - includes BP, CC, MF subcategories of GO
sf_GO:BP - includes GO biological process terms (if used together with sf_GO then the intersection is applied i.e. sf_GO:BP and sf_GO will give only sf_GO:BP terms)
sf_GO:CC - includes GO cellular component terms
sf_GO:MF - includes GO molecular function terms
sf_KEGG - includes KEGG pathways
sf_REAC - includes Reactome pathways
sf_TF - includes transcription factor predictions from Transfac
sf_MI - includes miRBase predictions
sf_HPA - includes Human Protein Atlas data
sf_CORUM - includes CORUM protein complexes
sf_HP - includes Human Phenotype Ontology terms
sf_BIOGRID - includes Biogrid protein complexes
Combining the parameters for:
-
threshold_algo
enum set tofdr
says to use Benjamini-Hochenberg Procedure as the basis to derive adjusted p-values for each pathway a la Rstats
functionp.adjust
-
significant
boolean set to0
says to ignore the default significance threshold filter inthreshold_algo
-
user_thr
real set to0.05
says to use this as the adjusted p-value threshold so that only those pathways with adjusted p's below this will be returned
Very opinionated app. We set a hard threshold for adjusted p-values (0.05) and data sources (GO: BP and Reactome). There is some room in the analysis to declare the gene set sizes and in the visualization to set the edge similarity threshold but that's it.