Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count matrix to cistopic link broken + scalability to large datasets question #447

Open
wgao688 opened this issue Aug 10, 2024 · 4 comments

Comments

@wgao688
Copy link

wgao688 commented Aug 10, 2024

Hi, I am trying to take a peak by count matrix generated using another software (SnapATAC2) which similar to ArchR produces a tile matrix and make this compatible with SCENIC+. The link showing a tutorial for this is broken: https://scenicplus.readthedocs.io/en/latest/faqs.html#i-have-an-analysis-with-another-tool-e-g-signac-archr-can-i-still-use-scenic

Additionally, I am wondering how scalable SCENIC+ is, as I have read that topic modeling can take quite a while. Have you tried running the analysis pipeline with >100K cells? Is subsampling recommended at some point? Thanks!

@SeppeDeWinter
Copy link
Collaborator

Hi @wgao688

You can still find the tutorial here: https://github.com/aertslab/pycisTopic/blob/old/notebooks/Toy_melanoma-RTD.ipynb

We have ran pycisTopic successfully with >100k cells, also we are improving the topic modeling step: https://github.com/aertslab/pycisTopic/tree/polars_1xx

Besides number of cells, topic modelling also scales with the number of regions. How many regions are you using? Especially when you use the tile matrix from snapATAC2 this number can be very large (I would suggest to use a count matrix based on consensus peaks, either from snapATAC2 or pycisTopic, if this is possible).

All the best,

Seppe

@ghuls
Copy link
Member

ghuls commented Aug 22, 2024

With the Polars 1xx branch it is now possible to make a Mallet corpus file from a binary count matrix file in Matrix Market format:

    Expose creation of Mallet corpus file from pycistopic CLI interface:
    
        pycistopic topic_modeling create_mallet_corpus
    
    Usage:
    
      Create binary accessibility matrix in Matrix Market format:
    
        import pycisTopic.fragments
        import scipy
    
        counts_fragments_matrix, cbs, region_ids = pycisTopic.fragments.create_fragment_matrix_from_fragments(
            "fragments.tsv.gz",
            "consensus_regions.bed",
            "cbs.tsv"
        )
    
        # Create binary matrix:
        binary_matrix = counts_fragments_matrix.copy()
        binary_matrix.data.fill(1)
    
        # Write binary matrix in Matrix Market format.
        scipy.io.mmwrite("binary_accessibility.mtx", binary_matrix)
    
      Create Mallet corpus file from binary accessibility matrix in Matrix Market format:
    
        $ pycistopic topic_modeling create_mallet_corpus -i "binary_accessibility.mtx" -o "corpus.mallet"

aertslab/pycisTopic@2d54473

@wgao688
Copy link
Author

wgao688 commented Aug 27, 2024

Thanks Seppe and Gert, I will try the code above. I am working with about 300K peaks for 300,000 cells. Should I subset to the most highly variable peaks (e.g., 50K, 100K)?

I also wish to handle batch effects (due to individual donor and technology (multiome vs. ATAC only). I see from the SCENIC+ paper that you used harmonypy on the scaled cell–topic matrix for the mouse cerebellum dataset. I also saw in a previous Github question that you do not perform batch correction typically (#134). Is there anything specific that you recommend for batch correction?

@wgao688
Copy link
Author

wgao688 commented Aug 28, 2024

@ghuls I am trying to run pycistopic topic_modeling create_mallet_corpus -i "binary_accessibility.mtx" -o "corpus.mallet" but I am not seeing the create_mallet_corpus option available. Is there something wrong with my download of the polars_1xx branch?

git clone --branch polars_1xx --single-branch https://github.com/aertslab/pycisTopic.git
cd pycisTopic/
pip install -e . 

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants