Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple jobs with retries #274

Open
hpoisner opened this issue Oct 16, 2023 · 3 comments
Open

Multiple jobs with retries #274

hpoisner opened this issue Oct 16, 2023 · 3 comments

Comments

@hpoisner
Copy link

Hi I typically use a loop to launch jobs that can all run concurrently and are not dependent on each other. I would like to use the retries flag to kick off the independent jobs that fail. This does not seem to work. Is there a solution to my problem
Example Code:
%%bash --out LINE_COUNT_JOB_ID

Get a shorter username to leave more characters for the job name.

DSUB_USER_NAME="$(echo "${OWNER_EMAIL}" | cut -d@ -f1)"

For AoU RWB projects network name is "network".

AOU_NETWORK=network
AOU_SUBNETWORK=subnetwork

MACHINE_TYPE="n2-standard-4"
BASH_SCRIPT="gs://fc-secure-cb192ac6-30ba-46b9-92ee-896a6e36c63e/dsub/hpoisner/snplist_step1/SNPlist_step1_mac75k.sh"
LOWER=1
UPPER=23
for ((chromo=$LOWER;chromo<$UPPER;chromo+=1))
do
dsub
--provider google-cls-v2
--user-project "${GOOGLE_PROJECT}"
--project "${GOOGLE_PROJECT}"
--image "marketplace.gcr.io/google/ubuntu1804:latest"
--network "${AOU_NETWORK}"
--subnetwork "${AOU_SUBNETWORK}"
--service-account "$(gcloud config get-value account)"
--user "${DSUB_USER_NAME}"
--regions us-central1
--logging "${WORKSPACE_BUCKET}/dsub/v7/logs/{job-name}/{user-id}/$(date +'%Y%m%d/%H%M%S')/{job-id}-{task-id}-{task-attempt}.log"
"$@"
--preemptible
--retries 2
--wait
--boot-disk-size 1000
--machine-type ${MACHINE_TYPE}
--name "${JOB_NAME}"
--script "${BASH_SCRIPT}"
--env GOOGLE_PROJECT=${GOOGLE_PROJECT}
--input plink=""
--input bgen_file=""
--input sample_file=""
--env chrom=${chromo}
--output-recursive OUTPUT_PATH="${OUTPUT_FILES}/${chromo}"
done

@wnojopra
Copy link
Contributor

Hi @hpoisner, you mention that does not seem to work, but can you please describe what you do observe to be happening? Are there any error messages? Any relevant logging? Any output that would indicate that a retry is not happening?

@hpoisner
Copy link
Author

The issue is it turns jobs that should run in parallel into sequential jobs. There aren't any specific error messages. We just want to run multiple jobs at once with the capacity to retry

@wnojopra
Copy link
Contributor

I see you're doing a loop over the chromosomes, and each call to dsub has a --wait flag. This means that each chromosome will wait to completion before going on to the next.

To run the jobs in parallel, instead you'll want to define a tasks TSV file where each line is a different chromosome. See https://github.com/DataBiosphere/dsub#submitting-a-batch-job for details on the tasks file format and the --tasks flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants