PIV-based analysis for chromatin from A- or B-compartments.
Date started: 2023-02-02
TODO: Update workflow diagram to be data-centric instead of rule-centric
graph TD
subgraph initialize
A99((all_init))
end
subgraph interactive rules
A98((all_<br>interactive))
end
subgraph segmentation
A2
B1
B2
end
subgraph piv analyis
A6
A8
A9
end
subgraph image processing
A1
A4
A5
end
A1((all_roi))
C1((all_<br>register))
A2((all_<br>segmentation_<br>nucleus))
A4((all_<br>measure))
A5((all_<br>normalize))
A6((all_piv))
A8((all_msnd))
A9((all_msnd_post))
B1((all_<br>segmentation_<br>nucleoli))
B2((all_<br>segmentation_<br>hc))
A01[config_template]
A99 --> A01
A02[download_<br>ilasktic_<br>models]
A98 --> A02 & A12
A11[parse_metadata] --> A01
A12[draw_roi] --> A11 & A01
A13{crop_roi} --> A12 & A01
A14[split_channels] --> C11 & A01
A1 --> A12 & C11
C11[register_nucleus] --> A13 & A01
C1 --> C11
A31[segment_nuclei_<br>in_time] --> A14 & A01
A2 --> A31
A41[measure] --> C11 & A31 & A01
A42[combine_<br>measurements] --> A41 & A01
A4 --> A42
A51[normalize_<br>singlechannel] --> A14 & A31 & A01
A52[normalize_<br>multichannel] --> A51 & A01
A5 --> A51 & A52
A61[gen_piv_<br>config_json] --> A01
A62[piv] --> A61 & A14 & A01
A6 --> A62
A81[msnd] --> A62 & A52 & A31 & B15 & B21 & A01
A8 --> A81
A91[fit_msnd_line] --> A81
A92[instantaneous_alphas] --> A81
A9 --> A91 & A92
B11{sn_crop_roi} --> A12
B12[sn_mask_tiff] --> B11 & A31
B13[sn_predict_<br>nucleoli] --> B12 & A02
B14[sn_convert_<br>to_ometif] --> B13
B15[sn_segment_<br>nucleoplasm] --> B14 & A31
B1 --> B14 & B15
B21[segment_hc] --> A52 & A01
B2 --> B21
This repo uses git submodules to manage some dependencies. To clone this repo, use the following command:
git clone --recurse-submodules https://github.com/yichechang/alu-mobility.git
Dependencies are listed in workflow/envs/abcdcs.yaml file and can be installed as a conda environment using either conda or mamba.
Currently, with mamba which is faster at solving, we need to first create an empty environment before we can install dependencies specified in an yaml file. See the issue and solution.
mamba create -n abcdcs
mamba env update -n abcdcs -f workflow/envs/abcdcs.yaml
Install manually before the first run.
mamba create -n ilastik
mamba activate ilastik
mamba install -c ilastik-forge ilastik
TODO: add to conda environment for snakemake to create this on the first run.
You'll need to have matlab
on your path. This can either be done by
manually creating a symbolic link to the matlab executable, or by using
the environment modules on a cluster (e.g. module load matlab/R2019b
).
Note: Do not rely on alias. It is fragile and likely won't work when
Snakemake execute a shell
directive.
conda activate abcdcs
cd
to analysis folder- Issue the following command to initialize the workflow:
This will copy the
snakemake \ -s {path/to/this/repo}/workflow/Snakefile \ -c1 \ init
config/config.yaml
to the analysis folder. - Edit the config file according to the experiment.
- Run snakemake locally, optionally specify target rule. (See
Snakefile
for possible all-type rules.)snakemake \ -s {path/to/this/repo}/workflow/Snakefile \ --configfile config.yaml \ --use-conda \ -c{n}
Note: If testing the repo with the test data, treat the repo folder as
the analysis folder mentioned above in step 2. And also no need to
specify where the main snakemake file and configureation files are,
via -s
and --configfile
, respectively.
cd
to analysis folder- Activate
abcd
conda environment by issuingconda activate abcd
- Issue the following command to initialize the workflow:
This will copy the
snakemake \ -s {path/to/this/repo}/workflow/Snakefile \ -c1 \ init
config/config.yaml
to the analysis folder. - Edit the config file according to the experiment.
- Launch
della-vis1
desktop via VNC on mydella. - Launch terminal and
module load anaconda3/2022.10 && conda activate abcd
cd
to analysis folder, and run
snakemake \
-s {path/to/this/repo}/workflow/Snakefile \
--configfile config.yaml \
--use-conda \
--use-envmodules \
-c1 \
all_interactive
cd
to analysis foldermodule load anaconda3/2022.10 && conda activate abcd
salloc --nodes=1 --ntasks=<n> --mem-per-cpu=<m>G --time=<t>
where<n>
is the number of cores to use,<m>
is the amount of memory per core, and<t>
is the time limit.- Issue the following command:
snakemake \
-s {path/to/this/repo}/workflow/Snakefile \
--configfile config.yaml \
--use-conda \
--use-envmodules \
-c{n} \
all && \
scancel $SLURM_JOB_ID
where {n}
is the core requested in the salloc
command.
This repository currently contains both
- a snakemake workflow with its config files, scripts, etc; and
- a python package
abcdcs
that is required for the workflow, but also includes modules can be used on their own for upstream preprocessing as well as downstream analyses.
In the future, it might make sense to keep track of them separately, but currently their development is closely related. Thus, we now use a single tagging system for version tracking.
The format is yyyy.MM.dd.[a-z]
where [a-z]
is used to differentiate
versions tagged on the same date.
2023.04.05.a
: Consider thisv0.0.9
!- include ilastik for nucleoli segmentation
- workflow is more modular with clearer main
Snakefile
- directly determine raw input files to be used for rules
- no pepfile is used anymore. just a single config file.
2023.03.31.a
: remove unused function in msnd2023.03.30.a
: normalize intensity for y459 and y491 on della2023.03.28.a
: Improve raw data compatibility with tiff file without metadata2023.03.26.c
: matpiv_v2 della (used for y459)2023.03.26.b
: snakemake on local and della up to PIV- Workflow runs on both local (everything to piv) and della (from cropping to piv).
- No job grouping should be used.
- On della, if want to avoid submit many small jobs (currently some
of the corresponding rules have time set to
61
minutes when they take only a few minutes, to avoid piling up in the short-job queue),salloc
then run without cluster profile is useful.