-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reviews #1
Comments
I wrote an example RNA-seq pipeline for generating gene expression counts here: https://github.com/jrderuiter/snakemake-rnaseq-star-featurecounts. Would be interested to hear your thoughts/comments concerning the structure/code. I've used wrappers where available, though wrappers for the other tools should be easy to implement. |
This looks very good! Besides some very minor things, it is exactly like a perfect snakemake workflow should be. Do you intend to move this over here?
|
Thanks! Sure, moving here would be fine. I'll have a look at your comments (especially point 4, which I agree would be ideal if this would also accept URLs and such). One other thing I was thinking about, is the flexibility of the output paths. For most of my projects, I use the cookiecutter data science template (https://github.com/drivendata/cookiecutter-data-science) as a starting point, which I think provides a nice default project structure. One nice thing about this template, is that it splits interim and 'final' processed files into two separate directories (data/interim and data/processed). I like this a lot, because it distinguishes interim steps (that data from which can be discarded) from the final processed outputs. In my own pipelines, I therefore typically distinguish between the interim and processed directories (which I base all rule paths on and are configurable via the config file/command line). For example,
I think it would be great if we could include some flexibility in the output paths, but am unsure what the best way is to do so. To some extent this can be done at runtime by changing the workdir for snakemake, but this does not allow us to distinguish between interim and processed output directories. I also don't want to introduce too much extra complexity into the pipelines. What are your thoughts on this? |
Maybe a stupid question, but how are software versions managed with wrappers and conda? Are versions of tools hardcoded in the wrapper, or can I specify which version of the tool should be used? |
By default, the wrapper uses a hardcoded conda environment (see here). By this, we ensure stability. If this shall be overridden, the user can specify a custom conda environment within the rule that uses the wrapper. This will always take precedence over the one that comes with the wrapper. Regarding the directories: While I also like the clean distinction, I think the datascience approach of interim and final directories is a bit too unflexible for Snakemake. With the support of temp files, you don't really need to distinguish output files like that in Snakemake. Interim files can simply be declared as temporary. Also, users of our workflows might have different ideas of what is interim. Sometimes, a bam is needed for further processing or even a result of its own, sometimes it can be discarded. Since the idea would be that people checkout the repo and possibly modify the workflow, we probably should use neutral directory names that rather describe the workflow step than what kind of usage the file will have. |
@jrderuiter is your workflow ready for moving here? It does not need to be perfect, we can mark it as WIP. |
I've been developing a workflow for metagenomic projects located here: https://github.com/NBISweden/nbis-meta. It supports paired- and single-end data and includes preprocessing, read-based classification, assembly and binning as well as functional and taxonomic annotations. Any input is appreciated. Thanks! |
Hello, I would like to contribute my workflow to the snakemake-workflows. |
Hey @johne and @SilasK sorry for being so late about this. I completely missed the notifications. Very nice pipelines! Both of them could be included here relatively quickly I think. @johnne:
@SilasK
Both
|
Thanks @johanneskoester! I've run linting on the workflow which did help a lot (thanks for that functionality!) but I'm afraid the workflow doesn't conform 100% to the linting. Mostly it's missing log directive for rules using the |
Hey! I've tried my hand at setting up a DeepVariant-GLnexus workflow. Would love to have it as a part of this repo. It has unit and integration tests. Genome fetching, mapping, etc is done similarly to the dna-seq-gatk-variant-calling workflow. I'm planning on moving this to the copier template system as well. With the latest update to the GATK workflow - fixing paths etc - this workflow could in turn probably also be significantly refactored to draw more on the GATK one. |
Hi @johanneskoester , I would like to add our pipeline into the snakemake catalog. I am not entirely certain if it meets your criteria though. For example Snakefile is not directly under workflow folder, the reason is that it provides more than a single pipeline and it is a collection of several pipelines. Do you have any feedback on how to get it to be added there? |
您好,您发的邮件我已收到。谢谢! 支永强
|
Please post below to request a review.
TODO
The text was updated successfully, but these errors were encountered: