You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a mutation $i$ does not appear in the training data, and we don't have any regularization on the $\beta_i$, the estimate will remain at the initialized value during optimization. This seems undesirable, since it will give random noise predictions for that mutation if it appears in the test set. Commonly this sort of thing is dealt with using a ridge regression term $$R(\beta) = \sum_i \beta_i^2,$$
which is equivalent to putting a normal prior centered at zero on the $\beta_i$. The drawback of this is that we know typical mutation effects are deleterious, not WT-like near zero, so shrinking towards zero seems like a bad idea.
Proposed resolution
Suppose we magically know that the typical latent mutation effect was $\bar\beta$. In that case we'd want the prior on $\beta_i$ to regularize toward that value instead of zero: $$R(\beta) = \sum_i (\beta_i-\bar\beta)^2.$$
This is equivalent to a normal prior centered at $\bar\beta$. Instead of encouraging WT-like predictions for unobserved mutations, this will encourage typical/deleterious predictions for them. So, my proposal is to use this offset ridge penalty, and include $\bar\beta$ as learnable scalar parameter representing the typical mutation effect (interpretable as a centering operation in the latent space).
The text was updated successfully, but these errors were encountered:
Problem
When a mutation$i$ does not appear in the training data, and we don't have any regularization on the $\beta_i$ , the estimate will remain at the initialized value during optimization. This seems undesirable, since it will give random noise predictions for that mutation if it appears in the test set. Commonly this sort of thing is dealt with using a ridge regression term
$$R(\beta) = \sum_i \beta_i^2,$$ $\beta_i$ . The drawback of this is that we know typical mutation effects are deleterious, not WT-like near zero, so shrinking towards zero seems like a bad idea.
which is equivalent to putting a normal prior centered at zero on the
Proposed resolution
Suppose we magically know that the typical latent mutation effect was$\bar\beta$ . In that case we'd want the prior on $\beta_i$ to regularize toward that value instead of zero:
$$R(\beta) = \sum_i (\beta_i-\bar\beta)^2.$$ $\bar\beta$ . Instead of encouraging WT-like predictions for unobserved mutations, this will encourage typical/deleterious predictions for them. So, my proposal is to use this offset ridge penalty, and include $\bar\beta$ as learnable scalar parameter representing the typical mutation effect (interpretable as a centering operation in the latent space).
This is equivalent to a normal prior centered at
The text was updated successfully, but these errors were encountered: