feat(external): implement METHLYANVI for scBS-seq data #3066

ethanweinberger · 2024-12-03T21:38:06Z

@canergen per our email exchange, this PR adds my MethylANVI model implementation within scvi.external.methylvi.

codecov · 2024-12-03T22:24:28Z

Codecov Report

Attention: Patch coverage is 83.25472% with 71 lines in your changes missing coverage. Please review.

Project coverage is 82.96%. Comparing base (a435561) to head (074370a).

Files with missing lines	Patch %	Lines
src/scvi/external/methylvi/_methylanvi_module.py	76.63%	25 Missing ⚠️
src/scvi/external/methylvi/_methylvi_model.py	76.47%	16 Missing ⚠️
src/scvi/external/methylvi/_base_components.py	86.66%	14 Missing ⚠️
src/scvi/model/base/_training_mixin.py	86.56%	9 Missing ⚠️
src/scvi/external/methylvi/_methylanvi_model.py	89.06%	7 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #3066    +/-   ##
========================================
  Coverage   82.95%   82.96%            
========================================
  Files         181      183     +2     
  Lines       15433    15692   +259     
========================================
+ Hits        12803    13019   +216     
- Misses       2630     2673    +43

Files with missing lines	Coverage Δ
src/scvi/data/fields/__init__.py	`100.00% <100.00%> (ø)`
src/scvi/data/fields/_scanvi.py	`100.00% <100.00%> (ø)`
src/scvi/dataloaders/_data_splitting.py	`95.47% <ø> (ø)`
src/scvi/dataloaders/_semi_dataloader.py	`92.30% <ø> (ø)`
src/scvi/external/__init__.py	`100.00% <100.00%> (ø)`
src/scvi/external/methylvi/__init__.py	`100.00% <100.00%> (ø)`
src/scvi/external/methylvi/_methylvi_module.py	`80.00% <100.00%> (ø)`
src/scvi/external/methylvi/_utils.py	`85.18% <100.00%> (ø)`
src/scvi/model/base/__init__.py	`100.00% <100.00%> (ø)`
src/scvi/external/methylvi/_methylanvi_model.py	`89.06% <89.06%> (ø)`
... and 4 more

ethanweinberger · 2024-12-09T19:55:15Z

Hi @canergen @ori-kron-wis

I'm removing the draft label here for now to get your feedback before proceeding further on this PR. In short, the goal of this PR is to add an implementation of the MethylANVI (MethylVI + scANVI) model from the MethylVI manuscript. Beyond just the code necessary for MethylANVI, I've also created some new mixin's to capture shared functions between models and avoid too much code duplication; if it's easier for you I'm happy to move these to separate PRs. I provide more details below:

MethylANVI model and module classes can be found in scvi/external/methylvi. To avoid duplicating shared functions between MethylVI/MethylANVI (e.g. get_normalized_methylation) I added a BSSeqMixin class in external/methylvi/_base_components.py. The BSSeqMixin essentially has the same role as RNASeqMixin for methylation.
To avoid duplicating functions that are identical for scANVI and MethylANVI, I also created a SemisupervisedTrainingMixin class in scvi/model/base/_training_mixin.py. This mixin currently provides implementations of the _set_indices_and_labels and train functions from scANVI. We could potentially abstract away more functions here (e.g. predict?), but I wanted to get your thoughts here before proceeding because this code touches things outside of external.
Minor: I added a MuData wrapper for the LabelsWithUnlabeledObsField for cell type labels.

canergen · 2024-12-20T07:52:45Z

src/scvi/external/methylvi/_base_components.py

+    ) -> (np.ndarray | pd.DataFrame) | dict[str, np.ndarray | pd.DataFrame]:
+        r"""Convenience function to obtain normalized methylation values for a single context.
+
+        Only applicable to MuData models.


Why this limitation? It's anyhow only accessible with MuData?

Fair enough. I've removed this comment in the latest version.

src/scvi/external/methylvi/_base_components.py

canergen · 2024-12-20T07:57:31Z

src/scvi/external/methylvi/_model.py

-        r"""Convenience function to obtain normalized methylation values for a single context.
-
-        Only applicable to MuData models.
+        use_posterior_mean: bool = True,


Addition to scANVI? Makes sense.

What do you mean here? The use_posterior_mean parameter is already present in scANVI.

canergen · 2024-12-20T07:59:18Z

src/scvi/external/methylvi/_model.py

+                batch_index=batch,
+                use_posterior_mean=use_posterior_mean,
+            )
+            if self.module.classifier.logits:


Do we need it here? This was for backward compatibility in scANVI.

It should be always legit.

canergen · 2024-12-20T08:00:08Z

src/scvi/external/methylvi/_model.py

+        y_pred = torch.cat(y_pred).numpy()
+        if not soft:
+            predictions = []
+            for p in y_pred:


Do list comprehension?

src/scvi/external/methylvi/_model.py

canergen · 2024-12-20T08:02:48Z

src/scvi/external/methylvi/_module.py

@@ -289,3 +281,348 @@ def sample(
                exprs[context] = dist.sample().cpu()

        return exprs
+
+
+class METHYLANVAE(METHYLVAE, BSSeqModuleMixin):


Put it in two files?

Done! The two modules can now be found in _methylvi_module.py and _methylanvi_module.py. For consistency I also split the two models into separate files (_methylvi_model.py and _methylanvi_model.py).

canergen · 2024-12-20T08:03:26Z

src/scvi/external/methylvi/_module.py

+
+
+class METHYLANVAE(METHYLVAE, BSSeqModuleMixin):
+    """Single-cell annotation using variational inference.


Methyl should be in here. Currently it's the acronym for scANVI

Thanks for the catch! Fixed.

canergen · 2024-12-20T08:05:27Z

src/scvi/external/methylvi/_module.py

+                w_y[:, group_index] *= w_g[:, [i]]
+        else:
+            w_y = self.classifier(z)
+        return w_y


Could we inherit the classifier from scANVI?

Ideally we would reuse the function from scANVI. Per @ori-kron-wis's comments I believe for now we should leave this as-is and clean it up in a subsequent PR with the SemisupervisedMixin.

canergen · 2024-12-20T08:18:32Z

I like the idea of a SemisupervisedMixin class especially as we'll add more semisupervised models soon'ish. Predict should best case be part of it. However, the input to the classifier is slightly different for both scANVI and methylANVI. I'm also thinking about whether we should also abstract the module code for semisupervised models.
@ori-kron-wis can you also review this code? And provide your ideas on it?

ori-kron-wis · 2024-12-24T15:46:13Z

Im in favor of reducing code duplications and having BSSeq & SemisupervisedTraining mixins.

I would suggest, as you both already pointed out, that as we expect more models, that are currently under development with "current" scanvi code, to use the SemisupervisedTraining mixin, I would create a new PR just for the scanvi changes here and concentrate only on Methylanvi in this PR, so our future integration will be easier.
The scanvi PR can be checked out from this branch.

It will have some code duplications for now until all other models will move also to SemisupervisedTrainingmixin (and probably expand it beyond methylvi).

I also validated the scnavi changes here, and it looks the same as before.

for more information, see https://pre-commit.ci

ethanweinberger · 2025-01-06T00:52:18Z

Hi @canergen @ori-kron-wis. Happy new year! I just finished modifying this PR to address your comments (including reverting the previous changes to scANVI). Tests are currently failing, but the failures appear unrelated to this PR (the tests are related to general data loading functions).

@canergen per your suggestion I added a predict function to the SemisupervisedMixin class. I tried to make the function flexible to handle different numbers of inputs without requiring too many changes in other classes (see my comments). Let me know what you think! I'd also be happy to add a corresponding semisupervised module mixin to this PR, or I can open another PR after to avoid putting too much in one PR.

ethanweinberger · 2025-01-06T00:58:17Z

src/scvi/model/base/_training_mixin.py

+        y_pred = []
+        for _, tensors in enumerate(scdl):
+            inference_inputs = self.module._get_inference_input(tensors)  # (n_obs, n_vars)
+            data_inputs = {key: inference_inputs[key] for key in self.module.data_input_keys}


@canergen This line is the main change to allow the predict function to be reused across models with different numbers of "data inputs" (e.g. mc + cov for BS-seq vs just RNA counts for RNA-seq).

It comes at the cost of requiring that a new field (data_input_keys) be specified in the module class, but this would enable more code re-use for semisupervised models.

ori-kron-wis · 2025-01-06T08:59:39Z

@ethanweinberger thanks!
The fail tests are a result of scipy version update integration issue with anndata. I noticed them.
scverse/anndata#1811. Tests will fail until they will release their fix.

Ethan Weinberger added 5 commits November 21, 2024 11:31

Retrieve methylation levels for specified context with MethylVI

4887899

Add test

4a40526

Initial MethylANVI commit

6142b93

Merge branch 'main' into external/methylanvi

445ecab

Fix formatting

d038b67

Ethan Weinberger added 14 commits December 3, 2024 16:00

MethylANVI docs

24c8f99

Doc fixes

3ab3f41

Fix test, factor out reconstruction loss

5309fbc

Fix test (again)

d4a12e7

Fix methylanVI test

fa43d6b

Add MuData labels field

b19fa17

Fix test

ceb880a

Mixin for Semisupervised training

83a7e60

Fix import

19415c4

Fix tests

3f13ce4

Fix scANVI modality key handling

52da5f0

Refactor getting mod key

b7925cf

BSSeq Mixin

23af079

Update test

f5f49c1

ethanweinberger marked this pull request as ready for review December 9, 2024 19:34

ethanweinberger changed the title ~~MethylANVI Model~~ feat(external): implement METHLYANVI for scBS-seq data Dec 9, 2024

canergen reviewed Dec 20, 2024

View reviewed changes

src/scvi/external/methylvi/_base_components.py Show resolved Hide resolved

canergen reviewed Dec 20, 2024

View reviewed changes

src/scvi/external/methylvi/_model.py Outdated Show resolved Hide resolved

canergen reviewed Dec 20, 2024

View reviewed changes

ori-kron-wis added 3 commits December 22, 2024 10:21

Merge branch 'main' into external/methylanvi

75bb701

Merge branch 'main' into external/methylanvi

6a97164

Merge branch 'main' into external/methylanvi

f7ceb5d

ori-kron-wis and others added 16 commits December 24, 2024 23:18

Merge branch 'main' into external/methylanvi

921888a

Merge branch 'main' into external/methylanvi

3e25916

Merge branch 'main' into external/methylanvi

c069a62

Merge branch 'main' into external/methylanvi

a1c8ee4

Merge branch 'main' into external/methylanvi

49a96a4

Split MethylVI/MethylANVI models/modules into separate files

729cf77

[pre-commit.ci] auto fixes from pre-commit.com hooks

6373970

for more information, see https://pre-commit.ci

Remove classifier logits check

8298ec5

Fix description of METHYLANVAE

c1f0f43

Revert SemisupervisedTrainingMixin

3d0cef3

Revert changes to _training_mixin.py file

8716574

Remove outdated import

734fe93

Revert scANVI changes

df61c91

Remove erroneous comment

1ac0a6c

Add back in SemisupervisedMixin for MethylANVI

1057b6d

Classify function for mixin

26a2d01

Classify function

3ef7032

ethanweinberger commented Jan 6, 2025

View reviewed changes

Merge branch 'main' into external/methylanvi

074370a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(external): implement METHLYANVI for scBS-seq data #3066

feat(external): implement METHLYANVI for scBS-seq data #3066

ethanweinberger commented Dec 3, 2024

codecov bot commented Dec 3, 2024 •

edited

Loading

ethanweinberger commented Dec 9, 2024

canergen Dec 20, 2024

ethanweinberger Jan 5, 2025

canergen Dec 20, 2024

ethanweinberger Jan 3, 2025

canergen Dec 20, 2024

canergen Dec 20, 2024

ethanweinberger Jan 3, 2025

canergen Dec 20, 2024

canergen Dec 20, 2024

ethanweinberger Jan 3, 2025

canergen Dec 20, 2024

ethanweinberger Jan 3, 2025

canergen Dec 20, 2024

ethanweinberger Jan 3, 2025

canergen commented Dec 20, 2024

ori-kron-wis commented Dec 24, 2024 •

edited

Loading

ethanweinberger commented Jan 6, 2025

ethanweinberger Jan 6, 2025 •

edited

Loading

ori-kron-wis commented Jan 6, 2025 •

edited

Loading



		class METHYLANVAE(METHYLVAE, BSSeqModuleMixin):
		"""Single-cell annotation using variational inference.

feat(external): implement METHLYANVI for scBS-seq data #3066

Are you sure you want to change the base?

feat(external): implement METHLYANVI for scBS-seq data #3066

Conversation

ethanweinberger commented Dec 3, 2024

codecov bot commented Dec 3, 2024 • edited Loading

Codecov Report

ethanweinberger commented Dec 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

canergen commented Dec 20, 2024

ori-kron-wis commented Dec 24, 2024 • edited Loading

ethanweinberger commented Jan 6, 2025

ethanweinberger Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

ori-kron-wis commented Jan 6, 2025 • edited Loading

codecov bot commented Dec 3, 2024 •

edited

Loading

ori-kron-wis commented Dec 24, 2024 •

edited

Loading

ethanweinberger Jan 6, 2025 •

edited

Loading

ori-kron-wis commented Jan 6, 2025 •

edited

Loading