-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(external): implement METHLYANVI for scBS-seq data #3066
base: main
Are you sure you want to change the base?
feat(external): implement METHLYANVI for scBS-seq data #3066
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3066 +/- ##
========================================
Coverage 82.95% 82.96%
========================================
Files 181 183 +2
Lines 15433 15692 +259
========================================
+ Hits 12803 13019 +216
- Misses 2630 2673 +43
|
I'm removing the draft label here for now to get your feedback before proceeding further on this PR. In short, the goal of this PR is to add an implementation of the MethylANVI (MethylVI + scANVI) model from the MethylVI manuscript. Beyond just the code necessary for MethylANVI, I've also created some new mixin's to capture shared functions between models and avoid too much code duplication; if it's easier for you I'm happy to move these to separate PRs. I provide more details below:
|
) -> (np.ndarray | pd.DataFrame) | dict[str, np.ndarray | pd.DataFrame]: | ||
r"""Convenience function to obtain normalized methylation values for a single context. | ||
|
||
Only applicable to MuData models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this limitation? It's anyhow only accessible with MuData?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I've removed this comment in the latest version.
src/scvi/external/methylvi/_model.py
Outdated
r"""Convenience function to obtain normalized methylation values for a single context. | ||
|
||
Only applicable to MuData models. | ||
use_posterior_mean: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addition to scANVI? Makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean here? The use_posterior_mean
parameter is already present in scANVI.
src/scvi/external/methylvi/_model.py
Outdated
batch_index=batch, | ||
use_posterior_mean=use_posterior_mean, | ||
) | ||
if self.module.classifier.logits: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need it here? This was for backward compatibility in scANVI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be always legit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
src/scvi/external/methylvi/_model.py
Outdated
y_pred = torch.cat(y_pred).numpy() | ||
if not soft: | ||
predictions = [] | ||
for p in y_pred: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do list comprehension?
@@ -289,3 +281,348 @@ def sample( | |||
exprs[context] = dist.sample().cpu() | |||
|
|||
return exprs | |||
|
|||
|
|||
class METHYLANVAE(METHYLVAE, BSSeqModuleMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put it in two files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! The two modules can now be found in _methylvi_module.py
and _methylanvi_module.py
. For consistency I also split the two models into separate files (_methylvi_model.py
and _methylanvi_model.py
).
|
||
|
||
class METHYLANVAE(METHYLVAE, BSSeqModuleMixin): | ||
"""Single-cell annotation using variational inference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Methyl should be in here. Currently it's the acronym for scANVI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch! Fixed.
w_y[:, group_index] *= w_g[:, [i]] | ||
else: | ||
w_y = self.classifier(z) | ||
return w_y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we inherit the classifier from scANVI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would reuse the function from scANVI. Per @ori-kron-wis's comments I believe for now we should leave this as-is and clean it up in a subsequent PR with the SemisupervisedMixin.
I like the idea of a SemisupervisedMixin class especially as we'll add more semisupervised models soon'ish. Predict should best case be part of it. However, the input to the classifier is slightly different for both scANVI and methylANVI. I'm also thinking about whether we should also abstract the module code for semisupervised models. |
Im in favor of reducing code duplications and having BSSeq & SemisupervisedTraining mixins. I would suggest, as you both already pointed out, that as we expect more models, that are currently under development with "current" scanvi code, to use the SemisupervisedTraining mixin, I would create a new PR just for the scanvi changes here and concentrate only on Methylanvi in this PR, so our future integration will be easier. It will have some code duplications for now until all other models will move also to SemisupervisedTrainingmixin (and probably expand it beyond methylvi). I also validated the scnavi changes here, and it looks the same as before. |
for more information, see https://pre-commit.ci
Hi @canergen @ori-kron-wis. Happy new year! I just finished modifying this PR to address your comments (including reverting the previous changes to scANVI). Tests are currently failing, but the failures appear unrelated to this PR (the tests are related to general data loading functions). @canergen per your suggestion I added a |
y_pred = [] | ||
for _, tensors in enumerate(scdl): | ||
inference_inputs = self.module._get_inference_input(tensors) # (n_obs, n_vars) | ||
data_inputs = {key: inference_inputs[key] for key in self.module.data_input_keys} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@canergen This line is the main change to allow the predict
function to be reused across models with different numbers of "data inputs" (e.g. mc + cov for BS-seq vs just RNA counts for RNA-seq).
It comes at the cost of requiring that a new field (data_input_keys
) be specified in the module class, but this would enable more code re-use for semisupervised models.
@ethanweinberger thanks! |
@canergen per our email exchange, this PR adds my MethylANVI model implementation within
scvi.external.methylvi
.