-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hisat2 align wrapper ht2l extension #3368
Comments
Well, the proper way would be to specify the files you want to use as index, and then infer the path from them. |
This works to allow .ht2l index files, but it seems to have the (rather significant, in my opinion) downside of requiring that the user know ahead of time what their index output is going to look like, whereas in automated pipelines the indexing will often be done blind then followed up immediately by mapping. The perk of a .glob() type approach is that as long as you put your indexation files in their own folder, not shared with other index files - which should be done anyway - you can just point snakemake to it, have the wrapper look for either ht2 or ht2l files, and go from there. |
I find globbing a bit dangerous as you never know what other files might be in the same folder. |
<!-- Ensure that the PR title follows conventional commit style (<type>: <description>)--> <!-- Possible types are here: https://github.com/commitizen/conventional-commit-types/blob/master/index.json --> <!-- Add a description of your PR here--> Fix issue #3368 ### QC <!-- Make sure that you can tick the boxes below. --> * [x] I confirm that I have followed the [documentation for contributing to `snakemake-wrappers`](https://snakemake-wrappers.readthedocs.io/en/stable/contributing.html). While the contributions guidelines are more extensive, please particularly ensure that: * [x] `test.py` was updated to call any added or updated example rules in a `Snakefile` * [x] `input:` and `output:` file paths in the rules can be chosen arbitrarily * [x] wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`) * [x] temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to * [x] the `meta.yaml` contains a link to the documentation of the respective tool or command under `url:` * [x] conda environments use a minimal amount of channels and packages, in recommended ordering <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced support for new index file formats (`.ht2l`) in both alignment and indexing processes. - Added a new rule for handling large index files in the HISAT2 alignment workflow. - **Bug Fixes** - Enhanced input handling for index files to improve clarity and maintainability. - **Documentation** - Updated `meta.yaml` to include a description and a link to the HISAT2 manual. - **Chores** - Significant updates to the Conda environment configuration, including version upgrades and removal of unnecessary dependencies. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
When using the hisat2 wrappers, I run into issues if my reference genome is large. This is because the hisat2 indexation will create ht2l files in this instance, not ht2, and the hisat2 align wrapper uses .glob("*.ht2") (line 29 of the current wrapper.py) and as such does not grab any ht2l files.
It seems to me adding to the ht2_files variable any files that match .ht2l via .glob("*.ht2l") as well would solve the problem.
It could also be considered to add a parameter to specify whether the genome used is large in the rule, but this seems a little too manual for the user in my opinion, when there don't really seem to be any issues with grabbing all .ht2 and .ht2l files.
This happens to me when using the wheat reference genome for example, which is quite large.
The text was updated successfully, but these errors were encountered: