Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running a looper pipeline ad hoc #538

Open
nleroy917 opened this issue Nov 26, 2024 · 4 comments
Open

Running a looper pipeline ad hoc #538

nleroy917 opened this issue Nov 26, 2024 · 4 comments

Comments

@nleroy917
Copy link
Member

nleroy917 commented Nov 26, 2024

I wonder if there are either 1) solutions for this or 2) easy ways to add the ability to run a looper pipeline in an ad hoc manner. What I mean by that is this: occasionally, the overhead of a traditional workflow can be a bit daunting, but I really enjoy the ease of dispatching off jobs through slurm+looper.

I would love to replace traditional bash for loops with looper calls.

An example

I have a folder with hundreds of mixed-type files. Some of these might be bedGraph files. I want to convert these to .bw format. I can use bigtools bedGraphToBigWig. Traditionally, I might just use a for loop:

for file in *.bdg; do
  bigtools bedGraphToBigWig $file $file.bw
done;

But this takes awhile since it goes one-by-one, and there are hundreds. I'd love to fire them all off at once using looper and slurm:

ls *.bdg | looper run "bigtools bedGraphToBigWig {$1} {$1}.bw"

I suppose I am trying to identify or nail-down a potential gap between traditional workflows and the flexibility researchers often need for quick, ad hoc job submission.

@nleroy917 nleroy917 changed the title Running a looper pipeline _ad hoc_ Running a looper pipeline *ad hoc* Nov 26, 2024
@nleroy917 nleroy917 changed the title Running a looper pipeline *ad hoc* Running a looper pipeline ad hoc Nov 26, 2024
@nleroy917
Copy link
Member Author

I guess the conditions for this to be useful would be:

  1. Extremely small PEP (one sample attribute)
  2. Extremely simple pipeline (bash or python one liner)
  3. Benefits from parallelization

@vreuter
Copy link
Member

vreuter commented Nov 26, 2024

@nleroy917 this is a good idea. IIRC, way back in time, @nsheff had an example or two like this which sort of "pushed the limits" "/ thought outside the box" (if I'm permitted some clichés) of looper in this way, maybe he has already a working example or something closest to this which would represent a good starting point?

@nleroy917
Copy link
Member Author

From infrastructure on December 3rd, 2024:

Theres two things to solve:

  1. What to do with the command template?
    Maybe using -y to give it a command template (command-extra-override) is a way to provide a command template when there was none to begin with?
  2. Can we make a PEP on the fly given some way of info?
    Sure... we can make it accept stdin and then what I wrote would work...?

@nleroy917
Copy link
Member Author

nleroy917 commented Dec 4, 2024

Just putting here for reference, I went down the rabbit hole slightly more and it is possible to parallelize natively using bash; just use xargs:

ls *.bdg | xargs -n 1 -P $(nproc) -I {} bash -c 'bigtools bedGraphToBigWig "{}" "{}.bw"'

Only works when $(nproc) returns a value greater than one of course... so you still would need to allocate some cores for yourself. Its an interesting stop-gap, but I still think the looper version proposed above would be way better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants