Question about frequency and correlation in TF regulatory networks #340

Sara-Tavallaei · 2024-12-02T11:24:57Z

Hi,

Thanks for advancing new types of analysis in the wonderful hdWGCNA package!
In the TF regulatory network construction, there is a question for me: how is it possible for a gene-TF axis to have correlation but with frequency 0 ? how could it be interpreted biologically?

smorabit · 2024-12-02T15:33:27Z

Hi, thanks for your interest in hdWGCNA, especially the newer feature like the TF network analysis 😊

how is it possible for a gene-TF axis to have correlation but with frequency 0 ?

The TF regulatory network analysis in hdWGCNA uses XGBoost to model the expression of a given gene based on its poitential regulators (TFs that have a binding motif within the gene's promoter region). One of the advantages of XGBoost for this kind of analysis is that it prioritizes which features are most important for improving the model performance. One of these metrics is Frequency, which tells us how frequently a feature is used in different tree splits. It is more informative to look at Gain rather than Frequency, which is the average error reduction, and allows us to rank features by their predictive power. To directly answer your question, a gene could be correlated with a TF, but that particular TF could have poor predictive power relative to other TFs.

how could it be interpreted biologically?

I would hesitate to interpret this biologically due to some key model assumptions (described below). I think it's easier to use this analysis to make a case that a TF is potentially regulating a gene than to make the case that a TF is NOT regulating a gene.

To me, this analysis is useful for hypothesis generation but there are a lot of simplifications and assumptions that we exploit. For example, TFs often require co-factors in order to regulate. Also, the promoter region must be accessible in terms of chromatin in order for binding to occur. There can also be non-linear relationships determining the expression of a gene based on multiple TFs. From transcriptomic data alone, it is impossible to determine these different things, so the model is essentially a simplified view of TF-gene regulation.

Let's say for example, you are super interested in a particular TF-gene pair based on this analysis. I would suggest following up with some functional validation experiments, validating computationally with some additional -omics like ATAC-seq or ChIP-seq.

Sara-Tavallaei · 2024-12-03T12:35:08Z

Hi, thanks for your interest in hdWGCNA, especially the newer feature like the TF network analysis 😊

how is it possible for a gene-TF axis to have correlation but with frequency 0 ?

The TF regulatory network analysis in hdWGCNA uses XGBoost to model the expression of a given gene based on its poitential regulators (TFs that have a binding motif within the gene's promoter region). One of the advantages of XGBoost for this kind of analysis is that it prioritizes which features are most important for improving the model performance. One of these metrics is Frequency, which tells us how frequently a feature is used in different tree splits. It is more informative to look at Gain rather than Frequency, which is the average error reduction, and allows us to rank features by their predictive power. To directly answer your question, a gene could be correlated with a TF, but that particular TF could have poor predictive power relative to other TFs.

how could it be interpreted biologically?

I would hesitate to interpret this biologically due to some key model assumptions (described below). I think it's easier to use this analysis to make a case that a TF is potentially regulating a gene than to make the case that a TF is NOT regulating a gene.

To me, this analysis is useful for hypothesis generation but there are a lot of simplifications and assumptions that we exploit. For example, TFs often require co-factors in order to regulate. Also, the promoter region must be accessible in terms of chromatin in order for binding to occur. There can also be non-linear relationships determining the expression of a gene based on multiple TFs. From transcriptomic data alone, it is impossible to determine these different things, so the model is essentially a simplified view of TF-gene regulation.

Let's say for example, you are super interested in a particular TF-gene pair based on this analysis. I would suggest following up with some functional validation experiments, validating computationally with some additional -omics like ATAC-seq or ChIP-seq.

Thanks!
I got the points.
I was wondering is it true to remove the TFs with frequency 0 for my target genes to find the best TFs? and now based on your explanation, esp "it's easier to use this analysis to make a case that a TF is potentially regulating a gene than to make the case that a TF is NOT regulating a gene.", and also the limitation of the transcriptomic data alone, I think it's better to keep those TFs.

smorabit · 2024-12-05T12:40:59Z

I was wondering is it true to remove the TFs with frequency 0 for my target genes to find the best TFs?

Can you clarify, when you say the "best" TFs, do you mean those which are most likely to regulate a target gene?

Sara-Tavallaei · 2024-12-06T06:29:34Z

I was wondering is it true to remove the TFs with frequency 0 for my target genes to find the best TFs?

Can you clarify, when you say the "best" TFs, do you mean those which are most likely to regulate a target gene?

yes, exactly
by the word "best TFs", I mean the ones which most likely to regulate a target gene in contrast to the other TFs and also have a significant meaningful correlation with target gene.

Sara-Tavallaei added the question Further information is requested label Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about frequency and correlation in TF regulatory networks #340

Question about frequency and correlation in TF regulatory networks #340

Sara-Tavallaei commented Dec 2, 2024

smorabit commented Dec 2, 2024

Sara-Tavallaei commented Dec 3, 2024

smorabit commented Dec 5, 2024

Sara-Tavallaei commented Dec 6, 2024

Question about frequency and correlation in TF regulatory networks #340

Question about frequency and correlation in TF regulatory networks #340

Comments

Sara-Tavallaei commented Dec 2, 2024

smorabit commented Dec 2, 2024

Sara-Tavallaei commented Dec 3, 2024

smorabit commented Dec 5, 2024

Sara-Tavallaei commented Dec 6, 2024