Tue Apr 27 15:48:29 AST 2010
- Andrew C. Stewart1, Brian Osborne2 and Timothy D. Read, DIYA: a
bacterial annotation pipeline for any genomics lab. Vol. 25 no. 7 2009, pages 962–963 doi:10.1093/bioinformatics/btp097
- Sadedin S, Pope B & Oshlack A, Bpipe: A Tool for Running and Managing Bioinformatics Pipelines, Bioinformatics, Vol. 28 no. 11 2012, pages 1525–1526
By turning your shell scripts into Bpipe scripts, here are some of the features you can get:
- Simple definition of tasks to run - Bpipe runs shell commands almost as-is - no programming required.
- Transactional management of tasks - commands that fail get outputs cleaned up, log files saved and the pipeline cleanly aborted. No out of control jobs going crazy.
- Easy Restarting of Jobs - when a job fails cleanly restart from the point of failure.
- Automatic Connection of Pipeline Stages - Bpipe manages the file names for input and output of each stage in a systematic way so that you don’t need to think about it. Removing or adding new stages “just works” and never breaks the flow of data.
- Easy Parallelism - Bpipe makes it simple to split jobs into many pieces and run them all in parallel whether on a cluster or locally on your own machine
- Audit Trail - Bpipe keeps a journal of exactly which commands executed and what their inputs and outputs were.
- Integration with Cluster Resource Managers - if you use Torque PBS, Oracle Grid Engine or Platform LSF then Bpipe will make your life easier by allowing pure Bpipe scripts to run on your cluster virtually unchanged from how you run them locally.
- Notifications by Email or Instant Message - Bpipe can send you alerts to tell you when your pipeline finishes or even as each stage completes.
- See how Bpipe compares to similar tools: https://code.google.com/p/bpipe/wiki/ComparisonToWorkflowTools
- Goodstadt,L. (2010) Ruffus: a lightweight Python library for computational pipelines. Bioinformatics, 26, 2778–2779.
Sun Dec 12 13:29:48 AST 2010
removed Build and Makefile dependencies and replaced with Dist::Zilla
dzil test dzil build … dzil clean
Recipe for dzil and git integration:
http://www.polettix.it/perlettix/id_dist-zilla
Dag Olden’s bundle and examples:
http://search.cpan.org/~dagolden/Dist-Zilla-PluginBundle-DAGOLDEN-0.013/lib/Dist/Zilla/PluginBundle/DAGOLDEN.pm http://search.cpan.org/~jquelin/Dist-Zilla-PluginBundle-JQUELIN-1.101620/lib/Dist/Zilla/PluginBundle/JQUELIN.pm
For some reason .gitignore files are not ignored from the release I had to modify
/usr/local/share/perl/5.10.1/Dist/Zilla/Plugin/PruneCruft.pm
by hand to add:
return 1 if $file->name =~ ~$; return 1 if $file->name =~ #;
log4 perl works. I got rid of the AUTOLOAD on my module. http://www.perl.com/pub/2002/09/11/log4perl.html
- http://www.taverna.org.uk/
- http://www.knime.org/
- https://kepler-project.org/
- Mobyle http://lipm-bioinfo.toulouse.inra.fr/biomoby/PlayMOBY/
- http://www.ailab.si/orange/
- http://ergatis.sourceforge.net/
check that interactive input works make steps wait for the the previous output if it is missing
An analysis pipeline is composed of individual programs the are executed in order and read in initial data, options and output from previous steps. A good pipeline creates a powerful and flexible program in its own right.
Common tools to code a pipeline or a workflow, are UNIX shell, typically BASH[1] and GNU make[2]. Both have their advantages but neither is ideal.
shell | make | |
---|---|---|
main purpose | general programming | source code building |
syntax | powerful | obscure, error prone |
Needed:
- clear, specific syntax
- store intermediate steps in files
- stop/restart at any step
- keep a detailed log
- stop pipeline on failure