Break out or change the scheduler part #12

druvus · 2018-01-23T15:02:14Z

We discussed earlier to find a better way of submitting/distributing jobs. I have used snakemake quite a lot and I think it could be an option to use their API instead to get a more stable solution. They have support for several schedulers plus kubernetes.

I include a small example I found below

#!/usr/bin/env python3
"""
rule all:
    input:
        "reads.counts"
rule unpack_fastq:
    '''Unpack a FASTQ file'''
    output: "{file}.fastq"
    input: "{file}.fastq.gz"
    resources: time=60, mem=100
    params: "{file}.params"
    threads: 8
    log: 'unpack.log'
    shell:
        '''zcat {input} > {output}
        echo finished 1>&2 {log}
        '''
rule count:
    '''Count reads in a FASTQ file'''
    output: counts="{file}.counts"
    input: fastq="{file}.fastq"
    run:
        n = 0
        with open(input.fastq) as f:
            for _ in f:
                n += 1
        with open(output.counts, 'w') as f:
            print(n / 4, file=f)
"""

In pure python this is equivalent to the following code.

workflow.include("pipeline.conf")

shell.prefix("set -euo pipefail;")

@workflow.rule(name='all', lineno=6, snakefile='.../Snakefile')
@workflow.input("reads.counts")
@workflow.norun()
@workflow.run
def __all(input, output, params, wildcards, threads, resources, log, version):
    pass


@workflow.rule(name='unpack_fastq', lineno=17, snakefile='.../Snakefile')
@workflow.docstring("""Unpack a FASTQ file""")
@workflow.output("{file}.fastq")
@workflow.input("{file}.fastq.gz")

@workflow.resources(time=60, mem=100)
@workflow.params("{file}.params")
@workflow.threads(8)
@workflow.log('unpack.log')
@workflow.shellcmd(
    """zcat {input} > {output}
        echo finished 1>&2 {log}
        """
)
@workflow.run
def __unpack_fastq(input, output, params, wildcards, threads, resources, log, version):
    shell("""zcat {input} > {output}
        echo finished 1>&2 > {log}
        """
)


@workflow.rule(name='count', lineno=52, snakefile='.../Snakefile')
@workflow.docstring("""Count reads in a FASTQ file""")
@workflow.output(counts = "{file}.counts")
@workflow.input(fastq = "{file}.fastq")
@workflow.run
def __count(input, output, params, wildcards, threads, resources, log, version):
    n = 0
    with open(input.fastq) as f:
        for _ in f:
            n += 1
    with open(output.counts, 'w') as f:
        print(n / 4, file=f)


### End of output from snakemake --print-compilation


workflow.check()
print("Dry run first ...")
workflow.execute(dryrun=True, updated_files=[])
print("And now for real")
workflow.execute(dryrun=False, updated_files=[], resources=dict())

Another option that I have used earlier is ipython-cluster-helper but it probably other options available.

The text was updated successfully, but these errors were encountered:

RickardSjogren · 2018-01-25T12:11:14Z

My suggestion is to sub-class or change the PipelineGenerator-class. That class parses the config-files and should contain all information needed to output a snakemake workflow. Currently new_pipeline_collection outputs a dict of script-strings, so an alternative to that function should be enough. The workflow DAG:s from our work will be very simple.

A simple executor-class is needed as well to start the pipeline and format the output to something the optimizer can use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break out or change the scheduler part #12

Break out or change the scheduler part #12

druvus commented Jan 23, 2018

RickardSjogren commented Jan 25, 2018

Break out or change the scheduler part #12

Break out or change the scheduler part #12

Comments

druvus commented Jan 23, 2018

RickardSjogren commented Jan 25, 2018