-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about frequency and correlation in TF regulatory networks #340
Comments
Hi, thanks for your interest in hdWGCNA, especially the newer feature like the TF network analysis 😊
The TF regulatory network analysis in hdWGCNA uses XGBoost to model the expression of a given gene based on its poitential regulators (TFs that have a binding motif within the gene's promoter region). One of the advantages of XGBoost for this kind of analysis is that it prioritizes which features are most important for improving the model performance. One of these metrics is Frequency, which tells us how frequently a feature is used in different tree splits. It is more informative to look at Gain rather than Frequency, which is the average error reduction, and allows us to rank features by their predictive power. To directly answer your question, a gene could be correlated with a TF, but that particular TF could have poor predictive power relative to other TFs.
I would hesitate to interpret this biologically due to some key model assumptions (described below). I think it's easier to use this analysis to make a case that a TF is potentially regulating a gene than to make the case that a TF is NOT regulating a gene. To me, this analysis is useful for hypothesis generation but there are a lot of simplifications and assumptions that we exploit. For example, TFs often require co-factors in order to regulate. Also, the promoter region must be accessible in terms of chromatin in order for binding to occur. There can also be non-linear relationships determining the expression of a gene based on multiple TFs. From transcriptomic data alone, it is impossible to determine these different things, so the model is essentially a simplified view of TF-gene regulation. Let's say for example, you are super interested in a particular TF-gene pair based on this analysis. I would suggest following up with some functional validation experiments, validating computationally with some additional -omics like ATAC-seq or ChIP-seq. |
Thanks! |
Can you clarify, when you say the "best" TFs, do you mean those which are most likely to regulate a target gene? |
yes, exactly |
Hi,
Thanks for advancing new types of analysis in the wonderful hdWGCNA package!
In the TF regulatory network construction, there is a question for me: how is it possible for a gene-TF axis to have correlation but with frequency 0 ? how could it be interpreted biologically?
The text was updated successfully, but these errors were encountered: