Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about frequency and correlation in TF regulatory networks #340

Open
Sara-Tavallaei opened this issue Dec 2, 2024 · 4 comments
Open
Labels
question Further information is requested

Comments

@Sara-Tavallaei
Copy link

Hi,

Thanks for advancing new types of analysis in the wonderful hdWGCNA package!
In the TF regulatory network construction, there is a question for me: how is it possible for a gene-TF axis to have correlation but with frequency 0 ? how could it be interpreted biologically?

@Sara-Tavallaei Sara-Tavallaei added the question Further information is requested label Dec 2, 2024
@smorabit
Copy link
Owner

smorabit commented Dec 2, 2024

Hi, thanks for your interest in hdWGCNA, especially the newer feature like the TF network analysis 😊

how is it possible for a gene-TF axis to have correlation but with frequency 0 ?

The TF regulatory network analysis in hdWGCNA uses XGBoost to model the expression of a given gene based on its poitential regulators (TFs that have a binding motif within the gene's promoter region). One of the advantages of XGBoost for this kind of analysis is that it prioritizes which features are most important for improving the model performance. One of these metrics is Frequency, which tells us how frequently a feature is used in different tree splits. It is more informative to look at Gain rather than Frequency, which is the average error reduction, and allows us to rank features by their predictive power. To directly answer your question, a gene could be correlated with a TF, but that particular TF could have poor predictive power relative to other TFs.

how could it be interpreted biologically?

I would hesitate to interpret this biologically due to some key model assumptions (described below). I think it's easier to use this analysis to make a case that a TF is potentially regulating a gene than to make the case that a TF is NOT regulating a gene.

To me, this analysis is useful for hypothesis generation but there are a lot of simplifications and assumptions that we exploit. For example, TFs often require co-factors in order to regulate. Also, the promoter region must be accessible in terms of chromatin in order for binding to occur. There can also be non-linear relationships determining the expression of a gene based on multiple TFs. From transcriptomic data alone, it is impossible to determine these different things, so the model is essentially a simplified view of TF-gene regulation.

Let's say for example, you are super interested in a particular TF-gene pair based on this analysis. I would suggest following up with some functional validation experiments, validating computationally with some additional -omics like ATAC-seq or ChIP-seq.

@Sara-Tavallaei
Copy link
Author

Hi, thanks for your interest in hdWGCNA, especially the newer feature like the TF network analysis 😊

how is it possible for a gene-TF axis to have correlation but with frequency 0 ?

The TF regulatory network analysis in hdWGCNA uses XGBoost to model the expression of a given gene based on its poitential regulators (TFs that have a binding motif within the gene's promoter region). One of the advantages of XGBoost for this kind of analysis is that it prioritizes which features are most important for improving the model performance. One of these metrics is Frequency, which tells us how frequently a feature is used in different tree splits. It is more informative to look at Gain rather than Frequency, which is the average error reduction, and allows us to rank features by their predictive power. To directly answer your question, a gene could be correlated with a TF, but that particular TF could have poor predictive power relative to other TFs.

how could it be interpreted biologically?

I would hesitate to interpret this biologically due to some key model assumptions (described below). I think it's easier to use this analysis to make a case that a TF is potentially regulating a gene than to make the case that a TF is NOT regulating a gene.

To me, this analysis is useful for hypothesis generation but there are a lot of simplifications and assumptions that we exploit. For example, TFs often require co-factors in order to regulate. Also, the promoter region must be accessible in terms of chromatin in order for binding to occur. There can also be non-linear relationships determining the expression of a gene based on multiple TFs. From transcriptomic data alone, it is impossible to determine these different things, so the model is essentially a simplified view of TF-gene regulation.

Let's say for example, you are super interested in a particular TF-gene pair based on this analysis. I would suggest following up with some functional validation experiments, validating computationally with some additional -omics like ATAC-seq or ChIP-seq.

Thanks!
I got the points.
I was wondering is it true to remove the TFs with frequency 0 for my target genes to find the best TFs? and now based on your explanation, esp "it's easier to use this analysis to make a case that a TF is potentially regulating a gene than to make the case that a TF is NOT regulating a gene.", and also the limitation of the transcriptomic data alone, I think it's better to keep those TFs.

@smorabit
Copy link
Owner

smorabit commented Dec 5, 2024

I was wondering is it true to remove the TFs with frequency 0 for my target genes to find the best TFs?

Can you clarify, when you say the "best" TFs, do you mean those which are most likely to regulate a target gene?

@Sara-Tavallaei
Copy link
Author

I was wondering is it true to remove the TFs with frequency 0 for my target genes to find the best TFs?

Can you clarify, when you say the "best" TFs, do you mean those which are most likely to regulate a target gene?

yes, exactly
by the word "best TFs", I mean the ones which most likely to regulate a target gene in contrast to the other TFs and also have a significant meaningful correlation with target gene.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants