The scripts and notebooks in this repository were created by @fefossa to support the projects developed during her Ph.D. It was developed during an internship at Cimini Lab and Carpenter-Singh Lab.
Inside each folder, it contains a set of Python functions related to each subproject that can be applied to different use cases.
Before following the instructions, make sure:
a) You have git installed;
b) Create a main repository to perform your analysis, and to be able to use scripts_notebooks_fossa
as a submodule. Follow these instructions to create your first repo.
c) I recommend to use GitHub Desktop to manage your repositories locally. See this video for a tutorial.
-
Create your fork of the repository
scripts_notebooks_fossa
:Result: The fork creates a copy of a this repository into your account that looks like:
https://github.com/$YOUR_USERNAME/scripts_notebooks_fossa.git
macOS/linux
-
Create a submodule inside your own analysis repository
-
Clone the main repo you created in (b):
YOUR_USERNAME="INSERT-USERNAME-HERE" REPO="INSERT-NAME-HERE" git clone [email protected]:$YOUR_USERNAME/$REPO.git
-
Open the Command Prompt window,
cd
to your repo locally, and add the submodule:cd $REPO git submodule add https://github.com/$YOUR_USERNAME/scripts_notebooks_fossa.git scripts_notebooks_fossa
-
In your main repository, you'll see
scripts_notebooks_fossa
like:
-
-
Create a new environment called
bioimage_scripts
using conda-
Download miniconda;
-
Open the Anaconda prompt and
cd
to the repo;cd $REPO/scripts_notebooks_fossa
-
Paste the following into the command prompt:
conda env create --file environment.yml
-
Run the notebooks available here inside this environment:
conda activate bioimage_scripts
-
Windows
-
Create a submodule inside your own analysis repository
-
Clone the main repo you created in (b):
set YOUR_USERNAME="INSERT-USERNAME-HERE" set REPO="INSERT-NAME-HERE" git clone [email protected]:%YOUR_USERNAME%/%REPO%.git
-
Open the Command Prompt window,
cd
to your repo locally, and add the submodule:cd %REPO% git submodule add https://github.com/%YOUR_USERNAME%/scripts_notebooks_fossa.git scripts_notebooks_fossa
-
In your main repository, you'll see
scripts_notebooks_fossa
like:
-
-
Create a new environment called
bioimage_scripts
using conda-
Download miniconda;
-
Open the Anaconda prompt and
cd
to the repo;cd %REPO%/scripts_notebooks_fossa
-
Paste the following into the command prompt:
conda env create --file environment.yml
-
Run the notebooks available here inside this environment:
conda activate bioimage_scripts
-
Inside each folder, there is an example notebook and an overall description.
To use any function inside a notebook, paste the following and change the path to your main repo:
import sys
sys.path.append(r"C:\Users\REPO")
To import a utilitary Python file from any folder, use:
from scripts_notebooks_fossa.pycombat_umap import combat_util
metadata folder: Notebooks and executable programs to:
1. Generate a metadata file from layout of a plate containing all the info about the assay (**metadata_from_layout_program**);
2. Generate a load csv file with the location of the images split by channel, and metadata info from Plate, Well, and Site.
profiles folder: It has one folder for each software output, but the idea is the same for both. There are two notebooks:
-
1_Samples_retrieval.ipynb: get the single cells extracted from a database file (.sqlite) from all the plates in the batch, and join them into one CSV file;
-
2_AggAnnNormFeat.ipynb: from the single cell data, aggregate, annotate, normalize, and feature select the dataset using pycytominer. More details inside the notebook.
pycombat_umap folder: It will process well-aggregated profiles and apply batch correction using PyCombat, and then use UMAP for visualization.
-
combat_util.py file: functions that accept DataFrames (pandas library). The requirements are pycytominer, pandas, plotly.express, and UMAP.
-
For more details on environment settings, see the readme inside the folder.
-
Run TSNE and UMAP for the number of iterations determined and plot the mean embedding and standard deviation.
plot_map folder: Give the main folder as an input, and looks in the subdirectories to find the files with the mAP x q values.
-
To calculate the mAP, use the instructions contained in the evalzoo.
-
Then, use plot_qvalue_map.ipynb to plot the mAPs. Choose the title of the plot and save it.
-
Example of the output:
correlation_matrix folder: Here we have functions to calculate and generate a Pearson correlation matrix per plate or per dataset.
dose_response folder: Create a dose-response curve based on concentration and cell viability values. Using linear regression, we calculate the linear function that represents that curve and get the IC50 (Inhibitory Concentration of 50% of the population).
individual_feature_and_statistics folder: Plot boxplots with each sample colored by the batch with the option to annotate with statannotations.
machine_learning folder: Example of running a Random Forest model to find the feature importance between groups and the shap value.
#cd $REPO #git submodule update --init --recursive
Follow this link https://gist.github.com/gitaarik/8735255#make-changes-inside-a-submodule
git clone --recurse-submodules [email protected]:$YOUR_USERNAME/$REPO.git