Skip to content

dgoodwin208/ExSeqProcessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ExSeqProcessing

This is the software library for processing Expansion Sequencing, ExSeq, experiments as seen in Alon et al 2021. This pipeline takes multi-round in situ sequencing data (including morphological information for spatial context) and convert it into RNA reads in 3D. This software has succesfully processed over 300 fields of view of ExSeq data, corresponding to dozens of terabytes of data, and has helped illucidate biological phenomena at the nanoscale in neuroscience and between cell types in metastatic breast cancer.

Getting Started

For the fastest path to exploring the steps of the ExSeqProcessing pipeline, we included a script which will simulate ExSeq data and then process that data. You can find that script in analysis/simulator/runSimulatorRound.m. That script should take about 10 minutes to run end-to-end on an modern laptop and should be a good introduction to the file format structure.

Please refer to the Wiki for information and a tutorial, complete with a sample ExSeq dataset. [Update, March 08, 2021: That Wiki page is undergoing a refactoring with simpler instructions].

The easiest way to get started with real data is to explore the tutorial folder in the root of the ExSeqProcessing repository. tutorial/ExSeqProcessing_tutorial.m walks through the simplest script of running all the necessary step. For larger-scale processing, we recommend using the runPipeline.sh bash script, with the necessary disclaimer that the tradeoff for its power and flexibility is an initial learning curve. To help new users get working with runPipeline.sh, we also include wrapper_batchexperiment.py to illustrate how we have utilized the runPipeline.sh tool.

We are grateful for all requests and questions - there will surely be bugs and we want to fix them. It is important to us that this software is a useful contribution to the community!

Overview

In order to use this pipeline, your data must be formatted in way that can be ingested. You must create grayscale 3D images (either tiff or hdf5) and place them into a folder structure explained here.

Under development

For larger experiments of many fields of view tiling a complete biological sample, we have been developing tools to assist with the automation and handling of challenges. We call this BigEXP and it specifically aims to help with the registration step of large samples. In the ideal case, the experimentalist can physically align the sample so that each field of view can be processed independently. However, in the case of large samples that need to be physically handled often, the assumption that each field of view will be aligned to itself across the sequencing rounds does not hold. BigEXP can used to register all the samples in the situation that fields of view can not be processed independently. This is still in a rough stage, and teams that would like to use this feature can post an issue or email dgoodwin at mit.

Acknowledgements

This software pipeline has been a successful multi-year, multi-team collaboration. Specifically, the Boyden Lab would like to express gratitude to Dr. Yosuke Bando from Kioxia (formerly Toshiba Memory) for his leadership in building a high performance compute system and Dr. Atsushi Kajita from Fixstars Solutions Inc. for his leadership in software optimization using GPUs and SSDs. Together, we have transformed a rough codebase, that originally took days to run for a single field of view, into a powerful and robust software system that can process an ExSeq field of view within an hour. This software tool successfully processed terabytes of data successfully and automatically. This software pipeline has been a foundation for experimental productivity and biological exploration, and we hope it can be of value to labs around the world.

We thank all the people that have contributed to this codebase:

From Kioxia: Yosuke Bando, Shintaro Sano and Seiji Maeda.

From Fixstars: Atsushi Kajita, Karl Marrett, Ramdas Pillai, Robert Prior

From MIT: Dan Goodwin, Shahar Alon, Andrew Xue, Adam Marblestone, Anu Sinha, Oz Wassie