Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The code to preprocess data #3

Open
SiaGuo opened this issue Oct 21, 2024 · 7 comments
Open

The code to preprocess data #3

SiaGuo opened this issue Oct 21, 2024 · 7 comments

Comments

@SiaGuo
Copy link

SiaGuo commented Oct 21, 2024

Hi,

The STAMP is quiet an efficient algorithm to integrate multi samples. But I wonder how to preprocess data. For example, I have four samples generated by 10X Visium, and how can I preprocess the data to the format as the input for STAMP. I would like to know how this part of the code is implemented. Thank you for your help!

Sincerely,
Sia G

@Chengwei94
Copy link
Collaborator

@SiaGuo,

You can just use the usual scanpy preprocess. However, the input into the algorithm is counts data, so rmb to save the counts data in the layer = "counts". Then for multiple samples, you can set the categorical_covariate_keys= [your_batch]. We will update our docs soon to make it more clear.

@SiaGuo
Copy link
Author

SiaGuo commented Oct 24, 2024

Got it! Thanks.

@katimbach
Copy link

Hi! I have a follow-up question regarding this. If I have a similar situation, with multiple samples that I would like to combine, but also want to account for the spatial neighbors, as in the mouse brain example (https://jinmiaochenlab.github.io/scTM/notebooks/stamp/example2/), is this possible?

I'm a bit confused, as it seems in the mouse brain tutorial the data is explicitly provided as a covariate, whereas in other tutorials, such as that with lung cancer, the neighbor graph is created but not provided as a model covariate (https://jinmiaochenlab.github.io/scTM/notebooks/stamp/example3/). If multiple samples are provided (and their distinction is included as a covariate), are the spatial graphs from each spatial sample still considered in the model?

@Chengwei94
Copy link
Collaborator

@katimbach

The covariate term is used to correct for batch effects, so in the SMI data, there is no batch since there is only 1 slice. The model is agnostic to the graph built, so if you want build the separate graph for each batch, sq.gr.spatial_neighbors(adata, library_key="data") does that. The library key builts disjoint graphs for each batch there.

@katimbach
Copy link

@Chengwei94 Thanks so much for your fast reply! Noted that I can merge samples and build the graphs after using the key.

So, I suppose in the mouse example the "data" is the obs with the slice info, as is the "library_id" in the multi-sample example (https://jinmiaochenlab.github.io/scTM/notebooks/stamp/example6/)? I was just getting confused by the "data" naming aspect (thinking it was another layer or something), but I think I understand now this is just an arbitrary name. I suppose then the neighbors for the multi-slice would've been previously built by sq.gr.spatial_neighbors(adata, library_key="library_id") ?

@Chengwei94
Copy link
Collaborator

@katimbach

Yep, you are right on that

@SiaGuo
Copy link
Author

SiaGuo commented Nov 21, 2024

@SiaGuo,

You can just use the usual scanpy preprocess. However, the input into the algorithm is counts data, so rmb to save the counts data in the layer = "counts". Then for multiple samples, you can set the categorical_covariate_keys= [your_batch]. We will update our docs soon to make it more clear.

Hi, I've encountered with a normalized dataset for the data availability. I wonder if STAMP works on normalized ST data (the technique with fair low resolution). Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants