Bioimage and data analysis scripts

The scripts and notebooks in this repository were created by @fefossa to support the projects developed during her Ph.D. It was developed during an internship at Cimini Lab and Carpenter-Singh Lab.

Inside each folder, it contains a set of Python functions related to each subproject that can be applied to different use cases.

Before following the instructions, make sure:

a) You have git installed;

b) Create a main repository to perform your analysis, and to be able to use scripts_notebooks_fossa as a submodule. Follow these instructions to create your first repo.

c) I recommend to use GitHub Desktop to manage your repositories locally. See this video for a tutorial.

1. Instructions

Create your fork of the repository scripts_notebooks_fossa:

Result: The fork creates a copy of a this repository into your account that looks like:
```
https://github.com/$YOUR_USERNAME/scripts_notebooks_fossa.git
```

macOS/linux

Create a submodule inside your own analysis repository

Clone the main repo you created in (b):

YOUR_USERNAME="INSERT-USERNAME-HERE"
REPO="INSERT-NAME-HERE"
git clone [email protected]:$YOUR_USERNAME/$REPO.git

Open the Command Prompt window, cd to your repo locally, and add the submodule:

cd $REPO
git submodule add https://github.com/$YOUR_USERNAME/scripts_notebooks_fossa.git scripts_notebooks_fossa

In your main repository, you'll see scripts_notebooks_fossa like:

Create a new environment called bioimage_scripts using conda
1. Download miniconda;
2. Open the Anaconda prompt and cd to the repo;
```
cd $REPO/scripts_notebooks_fossa
```
3. Paste the following into the command prompt:
```
conda env create --file environment.yml 
```
4. Run the notebooks available here inside this environment:
```
conda activate bioimage_scripts
```

Windows

Create a submodule inside your own analysis repository

Clone the main repo you created in (b):

set YOUR_USERNAME="INSERT-USERNAME-HERE"
set REPO="INSERT-NAME-HERE"
git clone [email protected]:%YOUR_USERNAME%/%REPO%.git

Open the Command Prompt window, cd to your repo locally, and add the submodule:

cd %REPO%
git submodule add https://github.com/%YOUR_USERNAME%/scripts_notebooks_fossa.git scripts_notebooks_fossa

In your main repository, you'll see scripts_notebooks_fossa like:

Create a new environment called bioimage_scripts using conda
1. Download miniconda;
2. Open the Anaconda prompt and cd to the repo;
```
cd %REPO%/scripts_notebooks_fossa
```
3. Paste the following into the command prompt:
```
conda env create --file environment.yml 
```
4. Run the notebooks available here inside this environment:
```
conda activate bioimage_scripts
```

2. Use the Python functions inside a Jupyter Notebook

Inside each folder, there is an example notebook and an overall description.

To use any function inside a notebook, paste the following and change the path to your main repo:

import sys
sys.path.append(r"C:\Users\REPO")

To import a utilitary Python file from any folder, use:

from scripts_notebooks_fossa.pycombat_umap import combat_util

3. Details for each folder

0. Metadata

metadata folder: Notebooks and executable programs to:

1. Generate a metadata file from layout of a plate containing all the info about the assay (**metadata_from_layout_program**);

2. Generate a load csv file with the location of the images split by channel, and metadata info from Plate, Well, and Site.

1. Profile generator for CellProfiler and DeepProfiler outputs

profiles folder: It has one folder for each software output, but the idea is the same for both. There are two notebooks:

1_Samples_retrieval.ipynb: get the single cells extracted from a database file (.sqlite) from all the plates in the batch, and join them into one CSV file;
2_AggAnnNormFeat.ipynb: from the single cell data, aggregate, annotate, normalize, and feature select the dataset using pycytominer. More details inside the notebook.

2. Batch correct and visualize profiles

pycombat_umap folder: It will process well-aggregated profiles and apply batch correction using PyCombat, and then use UMAP for visualization.

combat_util.py file: functions that accept DataFrames (pandas library). The requirements are pycytominer, pandas, plotly.express, and UMAP.
For more details on environment settings, see the readme inside the folder.
Run TSNE and UMAP for the number of iterations determined and plot the mean embedding and standard deviation.
Example of a plot:

3. Visualize samples replicability (mean average precision (mAP) results)

plot_map folder: Give the main folder as an input, and looks in the subdirectories to find the files with the mAP x q values.

To calculate the mAP, use the instructions contained in the evalzoo.
Then, use plot_qvalue_map.ipynb to plot the mAPs. Choose the title of the plot and save it.
Example of the output:

4. Correlation matrix

correlation_matrix folder: Here we have functions to calculate and generate a Pearson correlation matrix per plate or per dataset.

5. Dose-response (IC50)

dose_response folder: Create a dose-response curve based on concentration and cell viability values. Using linear regression, we calculate the linear function that represents that curve and get the IC50 (Inhibitory Concentration of 50% of the population).

6. Plot single features

individual_feature_and_statistics folder: Plot boxplots with each sample colored by the batch with the option to annotate with statannotations.

7. Machine learning

machine_learning folder: Example of running a Random Forest model to find the feature importance between groups and the shap value.

APPENDIX: Submodules tips

To update a submodule that's inside your main repo

#cd $REPO #git submodule update --init --recursive

Follow this link https://gist.github.com/gitaarik/8735255#make-changes-inside-a-submodule

To clone an analysis repo with its submodules

git clone --recurse-submodules [email protected]:$YOUR_USERNAME/$REPO.git

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
cellpose		cellpose
cellprofiler		cellprofiler
correlation_dose		correlation_dose
correlation_matrix		correlation_matrix
images		images
individual_feature_and_statistics		individual_feature_and_statistics
machine_learning		machine_learning
metadata		metadata
plot_map		plot_map
profiles		profiles
pycombat_umap		pycombat_umap
scripts_fossa.egg-info		scripts_fossa.egg-info
statistics_example		statistics_example
utils.egg-info		utils.egg-info
visualizations		visualizations
vscode		vscode
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bioimage and data analysis scripts

1. Instructions

2. Use the Python functions inside a Jupyter Notebook

3. Details for each folder

0. Metadata

1. Profile generator for CellProfiler and DeepProfiler outputs

2. Batch correct and visualize profiles

3. Visualize samples replicability (mean average precision (mAP) results)

4. Correlation matrix

5. Dose-response (IC50)

6. Plot single features

7. Machine learning

APPENDIX: Submodules tips

To update a submodule that's inside your main repo

To clone an analysis repo with its submodules

About

Releases

Packages

Languages

License

broadinstitute/scripts_notebooks_fossa

Folders and files

Latest commit

History

Repository files navigation

Bioimage and data analysis scripts

1. Instructions

2. Use the Python functions inside a Jupyter Notebook

3. Details for each folder

0. Metadata

1. Profile generator for CellProfiler and DeepProfiler outputs

2. Batch correct and visualize profiles

3. Visualize samples replicability (mean average precision (mAP) results)

4. Correlation matrix

5. Dose-response (IC50)

6. Plot single features

7. Machine learning

APPENDIX: Submodules tips

To update a submodule that's inside your main repo

To clone an analysis repo with its submodules

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages