scCODA reproducibility

This repository contains the code that was used to produce the results and plots for scCODA: A Bayesian model for compositional single-cell data analysis (Büttner and Ostner et al., 2020). The package containing the model can be found here.

Simulated data benchmarks

The directory benchmarking contains code for the three benchmarks (threshold determination, model comparison and overall benchmark). The data generation setup for all benchmarks is in generate_data.py. Each benchmark can then be executed via the according <benchmark_xy>_run.py script. The analysis and plots are produced in a jupyter notebook named <benchmark_xy>_analysis.py for each benchmark.

Note: Since running the benchmarks is very resource intensive, we provided all generated data and benchmark results at TODO!

To ensure compatibility with other versions of some python packages, we provided the data and results as .csv and .h5 files instead of .pkl files, which are produced by the code. Conversion functions are in benchmarking.to_from_pickle.py.

The analysis notebooks can be run with the files benchmark_results and sccoda_effects for each benchmark. For running the benchmarks, please convert the files generated_data and generation_parameters via the function benchmarking.to_from_pickle.benchmark_datasets_to_pickle:

Example workflow for reproducing a benchmark

Navigate to the parent directory of scCODA_reproducibility

a) Download the benchmark data and results from zenodo and unpack them here

or

b) If you want to re-run a benchmark, create the following directory structure:

<parent_directory>
|_  <scCODA_reproducibility>
|_  sccoda_benchmark_data
    |_  overall_benchmark
        |_  data_overall_benchmark
        |_  overall_benchmark_results
        |_  generated_datasets_overall_benchmark
        |_  overall_benchmark_plots
    |_  model_comparison_benchmark
        |_  data_model_comparison
        |_  model_comparison_results
        |_  generated_datasets_model_comparison
        |_  model_comparison_plots
    |_  threshold_determination_benchmark
        |_  data_threshold_determination
        |_  threshold_determination_results
        |_  generated_datasets_threshold_determination
        |_  threshold_determination_plots

Then run benchmarking.generate_data to generate all benchmark data. The generated_datasets_<benchmark_name> directories should now be filled with files.

Then, execute the according script benchmarks.<benchmark_name>_run. This produces results in the directory <benchmark_name>_results.

Once you finished re-running all the benchmarks you want, run benchmarking.to_from_pickle to convert the data to standardized file types.

To re-do the benchmark analysis, run the according jupyter notebook benchmarking.<benchmark_name>_analysis.

Applications

All applications of scCODA that were shown in the article can be found under applications. For each analysis, a jupyter notebook is provided.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
applications		applications
benchmarking		benchmarking
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scCODA reproducibility

Simulated data benchmarks

Example workflow for reproducing a benchmark

Applications

About

Releases

Packages

Languages

License

yuGithuuub/scCODA_reproducibility

Folders and files

Latest commit

History

Repository files navigation

scCODA reproducibility

Simulated data benchmarks

Example workflow for reproducing a benchmark

Applications

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages