Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta issue functional analyses #367

Open
Tracked by #385
grst opened this issue Dec 3, 2024 · 3 comments
Open
Tracked by #385

Meta issue functional analyses #367

grst opened this issue Dec 3, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@grst
Copy link
Member

grst commented Dec 3, 2024

Description of feature

Functional analyses are often more easier to interpret than the raw list of differential gene expression analyses.
Let's use this issue to get an overview of use-cases and methods we want to implement.

It's a non-goal to add all possible methods for Gene Set Enrichment Analysis. Instead let's focus on 1-2 methods per use case that implement the current best practice and are computationally efficient.

A general consideration here is if the methods should be executed on the DE results (e.g. GSEA, decoupleR on statistics, ORA ...) or
if they should be executed on the gene expression (e.g. ssGSEA, decoupleR on TPM, ...). The former means the statistics are computed on genes while the latter means statistics are computed on samples.

Gene set enrichment analysis of custom gene sets and or predefined gene sets (e.g. GO, HALLMARK, ...)

Transcription factors

Cancer pathways

cell type deconvolution

Anyone feel free to suggest other databases and/or methods.

CC @tschwarzl @apeltzer @atrigila @nschcolnicov @alanmmobbs93

@grst grst added the enhancement New feature or request label Dec 3, 2024
@grst
Copy link
Member Author

grst commented Dec 6, 2024

Notably, decoupleR implements the following methods from the list:

  • GSVA
  • ORA
  • GSEA

@grst
Copy link
Member Author

grst commented Dec 10, 2024

A general consideration here is if the methods should be executed on the DE results (e.g. GSEA, decoupleR on statistics, ORA ...) or
if they should be executed on the gene expression (e.g. ssGSEA, decoupleR on TPM, ...). The former means the statistics are computed on genes while the latter means statistics are computed on samples.

For us, it would certainly be useful to compute signature scores per sample as we may need to report them to a clinical database. The per-sample scores should be independnet of other samples as we have no control over what subsets of the data may be retrieved by others. SingScore/decoupleR on TPM certainly fulfil these criteria.

On the other hand, statistical power may be reduced when comparing scores between groups. The information from multiple genes is aggregated into a single value, we, therefore, loose the information that changes may be subtle, but coordinated changes into the same direction.

@grst
Copy link
Member Author

grst commented Dec 10, 2024

Status: wait until @suzannejin and team added a subworkflow for enrichment analyses (#384). Then work on extending it as necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant