Skip to content

Commit

Permalink
Simplify SubmitterHTCondor
Browse files Browse the repository at this point in the history
  • Loading branch information
dachengx committed Sep 2, 2024
1 parent 1789e90 commit 0678e58
Show file tree
Hide file tree
Showing 4 changed files with 212 additions and 297 deletions.
12 changes: 6 additions & 6 deletions alea/submitters/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,10 @@ htcondor_configurations:
pegasus_transfer_threads: 4
max_jobs_to_combine: 100
singularity_image: "/cvmfs/singularity.opensciencegrid.org/xenonnt/montecarlo:2024.04.1"
wf_id: "lq_b8_cevns_30"
workflow_id: "lq_b8_cevns_30"
```
- `template_path`: where you put your input templates. Note that **all files have to have unique names**. All templates inside will be tarred and the tarball will be uploaded to the grid when computing.
- `cluster_size`: clustering multiple `alea-run_toymc` jobs into a single job. For example, now you expect to run 100 individual `alea-run_toymc` jobs, and you specified `cluster_size: 10`, there will be only 10 `alea-run_toymc` in the end, each containing 10 jobs to run in sequence. Unless you got crazy amount of jobs like >200, I don't recommend changing it from 1.
- `cluster_size`: clustering multiple `alea_run_toymc` jobs into a single job. For example, now you expect to run 100 individual `alea_run_toymc` jobs, and you specified `cluster_size: 10`, there will be only 10 `alea_run_toymc` in the end, each containing 10 jobs to run in sequence. Unless you got crazy amount of jobs like >200, I don't recommend changing it from 1.
- `request_cpus`: number of CPUs for each job. It should be larger than alea max multi-threading number, otherwise OSG will complains.
- `request_memory`: requested memory for each job in unit of MB. Please don't put a number larger than what you need, because it will significantly reduce our available slots.
- `request_disk`: requested disk for each job in unit of KB. Please don't put a number larger than what you need, because it will significantly reduce our available slots.
Expand All @@ -56,11 +56,11 @@ htcondor_configurations:
- `pegasus_transfer_threads`: number of threads for transfering handled by `Pegasus`. The default 4 is good so in most cases you want to keep it.
- `max_jobs_to_combine`: number of toymc job to combine when concluding. Be cautious to put a number larger than 200 here, since it might be too risky...
- `singularity_image`: the jobs will be running in this singularity image.
- `wf_id`: name of user's choice for this workflow. If not specified it will put the datetime as `wf_id`.
- `workflow_id`: name of user's choice for this workflow. If not specified it will put the datetime as `workflow_id`.


### Usage
Make sure you configured the running config well, then you just simply pass `--htcondor` into your `alea-submission` command.
Make sure you configured the running config well, then you just simply pass `--htcondor` into your `alea_submission` command.

In the end of the return, it should give you something like this:
```
Expand Down Expand Up @@ -101,11 +101,11 @@ pegasus-run /scratch/yuanlq/workflows/runs/lq_b8_cevns_30
```

To collect the final outputs, there are two ways
- Check your folder `/scratch/$USER/workflows/outputs/<wf_id>/`. There should be a single tarball containing all toymc files and computation results.
- Check your folder `/scratch/$USER/workflows/outputs/<workflow_id>/`. There should be a single tarball containing all toymc files and computation results.
- A redundant way is to get files from dCache, in which you have to use `gfal` command to approach. For example ```gfal-ls davs://xenon-gridftp.grid.uchicago.edu:2880/xenon/scratch/yuanlq/lq_b8_cevns_30/``` and to get the files, for example do ```gfal-ls davs://xenon-gridftp.grid.uchicago.edu:2880/xenon/scratch/yuanlq/lq_b8_cevns_30/00/00/```. This contains both the final tarball and all `.h5` files before tarballing. To get them you want to do something like ```gfal-copy davs://xenon-gridftp.grid.uchicago.edu:2880/xenon/scratch/yuanlq/lq_b8_cevns_30/00/00/lq_b8_cevns_30-combined_output.tar.gz . -t 7200``` Note that this command works also on Midway/DaLI.

### Example Workflow
Here we only care about the purple ones, and the rest are generated by `Pegasus`.
- Each individual `run_toymc_wrapper` job is computing `alea-run_toymc`. For details what it is doing, see `run_toymc_wrapper.sh`.
- Each individual `run_toymc_wrapper` job is computing `alea_run_toymc`. For details what it is doing, see `run_toymc_wrapper.sh`.
- The `combine` job will just collect all outputs from the `run_toymc_wrapper` jobs, and tar them into a single tarball as final output.
<img width="1607" alt="Screen Shot 2024-05-08 at 5 24 51 PM" src="https://github.com/FaroutYLq/alea/assets/47046530/b1136330-2701-4538-b03c-8506383e4e20">
4 changes: 2 additions & 2 deletions alea/submitters/combine.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@
set -e

# Extract the arguments
wf_id=$1
workflow_id=$1

# Sanity check: these are the files in the current directory
ls -lh

# Make output filename
# This file will be used to store the output of the workflow
output_filename=$wf_id-combined_output.tar.gz
output_filename=$workflow_id-combined_output.tar.gz

# Tar all the .h5 files into the output file
tar -czf $output_filename *.h5
Expand Down
Loading

0 comments on commit 0678e58

Please sign in to comment.