-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Demo scripts to use iohub with SLURM job arrays #44
Conversation
Signed-off-by: Ziwen Liu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the code @ziw-liu .
Let's iterate on this, but this is a good start. I know this is just a demo script, but I am looking for feedback and also asking to improve the demo script with the following:
The current script expects the data.zarr
as input where I think ideally we would want to call these scripts with data.zarr/*/*/*
with wildcards. From an iteration with @talonchandler probably what would be useful would be something like this:
#!/bin/bash
POSPATHS=(/hpc/instruments/cm.mantis/2023_05_10_PCNA_RAC1/timelapse_2_3/timelapse_2_lightsheet_1.zarr/*/*/*)
echo "pospaths ${POSPATHS[0]}"
# Create an array to store the last three directories for each path
OUT_FOV_DIR=()
# Extract the last three directories for each path
for path in "${POSPATHS[@]}"; do
IFS='/' read -ra OUT_FOV_DIR <<< "$path"
OUT_FOV_DIR+=("${OUT_FOV_DIR[-3]}/${OUT_FOV_DIR[-2]}/${OUT_FOV_DIR[-1]}")
done
# Print the array elements
printf '%s\n' "${OUT_FOV_DIR[@]}"
#OUT_FOV_DIR[0]=0/0/0
#OUT_FOV_DIR[1]=0/1/0
The other thing to keep in mind is that this script as @ziw-liu mentions is that the approach of not creating the zarr store beforehand only works for smaller datasets. |
@edyoshikun I intentionally kept these scripts minimal to best serve as a POC of the iohub PR and allow us to decide the way going forward quickly on the core issue (metadata race condition). The extra niceties are great for a real template of mantis processing in a future PR, but not necessarily relevant to the purpose here. Also this PR is not meant to be merged, similar to mehta-lab/recOrder#374. |
Can you expand on this? I thought the limitation was the chunk size, not the dataset size? @ziw-liu I just ran this example and I think it's a step in the right direction, assuming that we don't have a dataset size limitation that @edyoshikun is mentioning. Can I request the following extensions to this example:
I think that we should shoot for a set of SLURM example scripts that merges into the |
This is just a simplification in the toy python processor here. It would be the same to do |
This is because if you don't specify |
Yes this is the final goal. But I'd like to decide between this and #46 before we invest more effort into quality-of-life features. |
Closing in favor of #47 |
POC of czbiohub-sf/iohub#143 with a cropped-down mantis dataset.
The input HCS dataset is scattered into individual FOVs, shipped to nodes in an array job, and written to temporary storage where the arrays are filled with the task IDs. The temporary FOVs are then gathered into an HCS store. Since the relative paths are kept unchanged in this process, the output HCS store has the same plate metadata as the input.