Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse README #6

Merged
merged 6 commits into from
Jul 28, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 0 additions & 88 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,94 +3,6 @@
This repository was derived from a [template repository](https://github.blog/2019-06-06-generate-new-repositories-with-repository-templates/) located at https://github.com/broadinstitute/pooled-cell-painting-profiling-template.
The purpose of the repository is to weld together a versioned data processing pipeline with versioned processed output data for a single Pooled Cell Painting experiment.

## Setup computational environment

First, install [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/).
We use conda as an environment manager.

```bash
# Install computational environment
conda env create --force --file environment.yml

# Initialize the environment
conda activate pooled-cell-painting
```

## Perform the weld

The welding procedure is a three-step process.

1. Activate conda environment (see above)
2. Manually update the configuration yaml documents for your specific experiment
* Yaml documents with reasonable default values are available in the [config/](config/) folder.
* Do not change the location of these files.
* Additional documentation for each of the parameters is available in the [config/docs/](config/docs/) folder.
3. Execute `weld.sh` (see below)

```bash
# After performing steps 1 and 2 above, perform step 3:
./weld.sh
```

## **AFTER GENERATING A NEW REPO, CHANGE OR DELETE ALL NONSPECIFIC DETAILS**

<p align="center">
<img src="https://raw.githubusercontent.com/broadinstitute/pooled-cp-profiling-template/a57cb7f9e36b89ff56acf094f18ca06b1a53b719/media/pipeline_weld.png" width="500">
</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a figure legend. Do you want to give it a first crack?


## Setup

To correctly initialize the repository, we need to perform several manual steps.

### Step 0: Create a New Repository **using this Repository as a Template**

By spinning up a new repo using this repo as a template, you will retain all code, configuration files, computational environments, and directory structure that a standard Pooled Cell Painting workflow expects and produces.

### Step 1: Fork The Pooled Cell Painting Painting Recipe

We first want to [fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the official pooled cell profiling recipe located at https://github.com/broadinstitute/pooled-cp-profiling-recipe.

* **Result:** The fork creates a copy of a recipe repository.
* **Goals:** 1) Remove the connection to official recipe updates to avoid unintended weld versioning reversal; 2) Enable independent updates to fork code that does not impact official recipe.
* **Execution:** See [forking instructions](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) and the image below.

![Step 1: Fork](media/step1_forkrecipe.png)

### Step 2: Create a Submodule inside this Repository of the Forked Recipe

Next, we will create a [submodule](https://gist.github.com/gitaarik/8735255) in this repo.

* **Result:** Adding a submodule initiates the weld.
* **Goals:** 1) Link the processing code (recipe) with the data (current repo); 2) Require a manual step to update the recipe to enable asynchronous development.
* **Execution:** See below

```bash
# In your terminal, clone the repository you just created (THIS REPO)
USER="INSERT-USERNAME-HERE"
REPO="INSERT-NAME-HERE"
git clone [email protected]:$USER/$REPO.git

# Navigate to this directory
cd $REPO

# Add the Recipe Submodule
git submodule add https://github.com/$USER/pooled-cp-profiling-recipe.git pooled-cp-profiling-recipe
```

Refer to ["Adding a submodule"](https://gist.github.com/gitaarik/8735255#adding-a-submodule) for more details.

### Step 3: Commit the Submodule

Lastly, we will [commit](https://help.github.com/en/desktop/contributing-to-projects/committing-and-reviewing-changes-to-your-project#about-commits) the submodule to github.

* **Result:** Committing this change finalizes the weld.
* **Goals:** 1) Track the submodule (recipe) version with the current repository.
* **Execution:** See below

```bash
# Add, commit, and push the submodule contents
git add pooled_cp_profiling_recipe
git add .gitmodules
git commit -m 'finalizing the recipe weld'
git push
```
Binary file added media/new_repo_from_template.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/use_this_template.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 31 additions & 0 deletions setup_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
The following are the two setup steps that need to be performed once at the start of a project.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following are the two setup steps that need to be performed once at the start of a project.
The following are the two setup steps that need to be performed **only once** at the start of an experiment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of each batch as possibly being a separate experiment, and these steps don't need to happen with each batch. So if you don't like "project" and I don't like "experiment", can we come up with another word?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And/or we should put into the documentation how we define project/experiment/batch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to say "start of each batch of data collection" or "start of each project batch"? Let's keep only once though.

I agree the data pipeline welding "unit" will be experimental batch. An experiment may contain multiple batches, and a project may contain multiple experiments (in my view).

Although we may eventually want to make the recipe focused at the "experiment" level (as defined above) since we are likely to want to develop batch effect correction methods. These tools should be part of the recipe IMO - this is a bit down the road though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added a commit that defines our terms and so it should be consistent now.

I agree we want to eventually bring it to the experiment level (related issue).


For a general overview of the pipeline welding process, see the [repo README](README.md).
For the welding process steps to perform with each dataset, see the [weld process README](weld_process_README.md).

## Setup the Computational Environment

Install [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/).
We use conda as an environment manager.

```bash
# Install computational environment
conda env create --force --file environment.yml

# Initialize the environment
conda activate pooled-cell-painting
```

## Fork the Pooled Cell Painting Painting Recipe

We first want to [fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the official pooled cell profiling recipe located at https://github.com/broadinstitute/pooled-cp-profiling-recipe.

* **Result:**
The fork creates a copy of a recipe repository.
* **Goals:**
1) Remove the connection to official recipe updates to avoid unintended weld versioning reversal.
2) Enable independent updates to fork code that does not impact official recipe.
* **Execution:**
See [forking instructions](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) and the image below.
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved

![Step 1: Fork](media/step1_forkrecipe.png)
104 changes: 104 additions & 0 deletions weld_process_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
The following are the weld process steps to perform with each dataset you analyze.
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved

For a general overview of the pipeline welding process, see the [repo README](README.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love these pointers

For the setup steps that need to be performed once at the start of a project, see the [setup REAME](setup_README.md).

### Step 0: Update Your Forked Recipe (Optional)
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved
* **Result:**
Updates (or reverts) your recipe to include any desired changes.
* **Goal:**
1) Allow you to make changes to your recipe from dataset to dataset (or batch to batch).
* **Execution:**
If you would like your recipe to include any updates to the official recipe:
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved
```
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved
git fetch upstream
git checkout master
git merge upstream/master
git push
```
If you would like your recipe to include any updates that you have made:
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved
```
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved
git checkout UPDATED-BRANCH
```
or
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved
```
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved
git checkout <commit_hash>
```

### Step 1: Create a New Repository **Using This Repository as a Template**
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved

* **Result:**
A repository for each dataset/batch.
* **Goal:**
1) Retain all code, configuration files, computational environments, and directory structure that a standard Pooled Cell Painting workflow expects and produces.
* **Execution:**
Click "Use this template".
![Use_this_template](media/use_this_template.png)
Enter a name for your new repository that includes your batch name and click "Create repository from template".
![New_Repo](media/new_repo_from_template.png)
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved

### Step 2: Create a Submodule of the Forked Recipe Inside the New Repository
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved

Next, we create a [submodule](https://gist.github.com/gitaarik/8735255) in the repository we just created.

* **Result:**
Adding a submodule initiates the weld.
* **Goals:**
1) Link the processing code (recipe) with the data (current repo).
2) Require a manual step to update the recipe to enable asynchronous development.
* **Execution:** See below
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved

```bash
# In your terminal, clone the repository you just created (THIS REPO)
USER="INSERT-USERNAME-HERE"
REPO="INSERT-NAME-HERE"
git clone [email protected]:$USER/$REPO.git

# Navigate to this directory
cd $REPO

# Add the Recipe Submodule
git submodule add https://github.com/$USER/pooled-cp-profiling-recipe.git pooled-cp-profiling-recipe
```

Refer to ["Adding a submodule"](https://gist.github.com/gitaarik/8735255#adding-a-submodule) for more details.

### Step 3: Commit the Submodule
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved

Lastly, we [commit](https://help.github.com/en/desktop/contributing-to-projects/committing-and-reviewing-changes-to-your-project#about-commits) the submodule to github.

* **Result:**
Committing this change finalizes the weld.
* **Goal:**
1) Track the submodule (recipe) version with the current repository.
* **Execution:**
See below
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved

```bash
# Add, commit, and push the submodule contents
git add pooled_cp_profiling_recipe
git add .gitmodules
git commit -m 'finalizing the recipe weld'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whatever word we decide above should be substituted in this commit message. It was probably me that used this terminology before 😅

git push
```

## Step 4: Perform the Weld
* **Result:**
Data is processed and figures and data are output.
* **Goal:**
1) Track the submodule (recipe) version with the current repository.
* **Execution:**
1) Activate conda environment.
```
conda activate pooled-cell-painting
```
2) Manually update the configuration yaml documents for your specific experiment.
Yaml documents with reasonable default values are available in the [config/](config/) folder.
Do NOT change the location of the .yaml files.
Additional documentation for each of the parameters is available in the [config/docs/](config/docs/) folder.
3) Execute `weld.sh` (see below)
ErinWeisbart marked this conversation as resolved.
Show resolved Hide resolved

```
bash
./weld.sh
```