Running a looper pipeline ad hoc #538

nleroy917 · 2024-11-26T00:59:48Z

I wonder if there are either 1) solutions for this or 2) easy ways to add the ability to run a looper pipeline in an ad hoc manner. What I mean by that is this: occasionally, the overhead of a traditional workflow can be a bit daunting, but I really enjoy the ease of dispatching off jobs through slurm+looper.

I would love to replace traditional bash for loops with looper calls.

An example

I have a folder with hundreds of mixed-type files. Some of these might be bedGraph files. I want to convert these to .bw format. I can use bigtools bedGraphToBigWig. Traditionally, I might just use a for loop:

for file in *.bdg; do
  bigtools bedGraphToBigWig $file $file.bw
done;

But this takes awhile since it goes one-by-one, and there are hundreds. I'd love to fire them all off at once using looper and slurm:

ls *.bdg | looper run "bigtools bedGraphToBigWig {$1} {$1}.bw"

I suppose I am trying to identify or nail-down a potential gap between traditional workflows and the flexibility researchers often need for quick, ad hoc job submission.

The text was updated successfully, but these errors were encountered:

nleroy917 · 2024-11-26T02:36:16Z

I guess the conditions for this to be useful would be:

Extremely small PEP (one sample attribute)
Extremely simple pipeline (bash or python one liner)
Benefits from parallelization

vreuter · 2024-11-26T10:41:31Z

@nleroy917 this is a good idea. IIRC, way back in time, @nsheff had an example or two like this which sort of "pushed the limits" "/ thought outside the box" (if I'm permitted some clichés) of looper in this way, maybe he has already a working example or something closest to this which would represent a good starting point?

nleroy917 · 2024-12-03T18:09:39Z

From infrastructure on December 3rd, 2024:

Theres two things to solve:

What to do with the command template?
Maybe using -y to give it a command template (command-extra-override) is a way to provide a command template when there was none to begin with?
Can we make a PEP on the fly given some way of info?
Sure... we can make it accept stdin and then what I wrote would work...?

nleroy917 · 2024-12-04T15:20:16Z

Just putting here for reference, I went down the rabbit hole slightly more and it is possible to parallelize natively using bash; just use xargs:

ls *.bdg | xargs -n 1 -P $(nproc) -I {} bash -c 'bigtools bedGraphToBigWig "{}" "{}.bw"'

Only works when $(nproc) returns a value greater than one of course... so you still would need to allocate some cores for yourself. Its an interesting stop-gap, but I still think the looper version proposed above would be way better.

github-project-automation bot added this to PEP Nov 26, 2024

nleroy917 added enhancement help wanted question UX labels Nov 26, 2024

nleroy917 changed the title ~~Running a looper pipeline _ad hoc_~~ Running a looper pipeline *ad hoc* Nov 26, 2024

nleroy917 changed the title Running a looper pipeline *ad hoc* Running a looper pipeline ad hoc Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running a looper pipeline ad hoc #538

Running a looper pipeline ad hoc #538

nleroy917 commented Nov 26, 2024 •

edited

Loading

nleroy917 commented Nov 26, 2024

vreuter commented Nov 26, 2024

nleroy917 commented Dec 3, 2024

nleroy917 commented Dec 4, 2024 •

edited

Loading

Running a looper pipeline ad hoc #538

Running a looper pipeline ad hoc #538

Comments

nleroy917 commented Nov 26, 2024 • edited Loading

An example

nleroy917 commented Nov 26, 2024

vreuter commented Nov 26, 2024

nleroy917 commented Dec 3, 2024

Theres two things to solve:

nleroy917 commented Dec 4, 2024 • edited Loading

nleroy917 commented Nov 26, 2024 •

edited

Loading

nleroy917 commented Dec 4, 2024 •

edited

Loading