-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/ samtools typehinted wrappers #26
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #26 +/- ##
==========================================
+ Coverage 93.01% 93.09% +0.08%
==========================================
Files 30 32 +2
Lines 3119 3463 +344
Branches 579 636 +57
==========================================
+ Hits 2901 3224 +323
- Misses 145 157 +12
- Partials 73 82 +9 ☔ View full report in Codecov by Sentry. |
91d165b
to
d4c66a7
Compare
415df2c
to
f2b31bc
Compare
85eb73c
to
015aa54
Compare
bd54520
to
ae5b948
Compare
c483a09
to
0ed945c
Compare
* Rebased onto current main * Update argument names to avoid shadowing "input" * Allow samtools index for multiple paths
0ed945c
to
c0d9e00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviving this PR @TedBrookings!
I haven't looked at the tests yet, but I do see many new code paths/branches which are untested. Could you add additional tests where it makes sense to ensure we have coverage over important codepaths?
Most of my comments are about the API and documentation.
Could you also update the PR title to be more descriptive so it will be easier to understand what this commit does once squashed to main?
I'll do another end-to-end review (including tests) after you've taken another pass at the PR!
@@ -198,13 +198,14 @@ class SamFileType(enum.Enum): | |||
ext (str): The standard file extension for this file type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add index_ext
to attributes docs.
self.mode = mode | ||
self.extension = ext | ||
self.index_extension = index_ext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer keeping the parameter and attribute names the same as it makes things more obvious/readable:
self.index_extension = index_ext | |
self.index_extension = index_extension |
Although I see the convention has already broken for ext
/extension
. 🤔
@@ -911,3 +912,5 @@ class SamOrder(enum.Enum): | |||
Coordinate = "coordinate" #: coordinate sorted | |||
QueryName = "queryname" #: queryname sorted | |||
Unknown = "unknown" # Unknown SAM / BAM / CRAM sort order | |||
# Sort by template-coordinate, SO is unsorted, GO is query, SS is template-coordinate: | |||
TemplateCoordinate = "unsorted" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move/upgrade comments to become attributes docs in the main class docstring for auto documentation?
from fgpyo.sam import SamOrder | ||
|
||
|
||
def pysam_fn(*args: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to be a part of the public API?
Is if TYPE_CHECKING
preferred over making a stub or .pyi
file? (I'm not an expert on this pattern). Seems like making some stubs for the simple pysam calls would be nicer than having this boilerplate in our library code... let me know what is the right path forward here, I'm curious!
|
||
def dict( | ||
fasta_file: Path, | ||
output: Path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input is descriptive fasta_file
but the output output
is not.
I would suggest input
/output
or fasta
/dict
(the types already indicate it is a Path
).
kmer_size: the kmer-size to be used if sorting unmapped reads. | ||
compression_level: The compression level to be used in the final output file. | ||
memory_per_thread: Approximately the maximum required memory per thread, specified either | ||
in bytes or with a K, M, or G suffix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suffixes include M
as valid yet the CLI signature default has MB
?
sort_unmapped_reads: bool = False, | ||
kmer_size: int = 20, | ||
compression_level: Optional[int] = None, | ||
memory_per_thread: str = "768MB", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did this default come from?
Arguments will be formatted into the appropriate command-line call which will be invoked | ||
using the pysam dispatcher. | ||
|
||
Returns the stdout of the resulting call to pysam.sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Google style docstrings, use the Returns:
block like you would an Args:
or Attributes
.
Only making this comment here, but it applies to all other docstrings too.
https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
if index_type == SamIndexType.CSI: | ||
args.extend(["-m", str(csi_index_size)]) | ||
args.extend(["-@", str(threads)]) | ||
args.extend(f"{input_path}" for input_path in alignment_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatting of when you add a newline whitespace between code vs choosing not to is changing throughout the PR. I'd recommend keeping the style consistent.
@@ -0,0 +1,375 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this file automatically show up in generated docs?
I know we'll eventually need to fixup the VCF work so it is indexed: #72
Closes: #22
EDIT: This is the first 4 type hinted wrappers around pysam dispatch functions.
A lot more to do but I wanted to start with the functions most important to us.(Per conversation with Tim these 4 functions are the only ones we'll be wrapping for now.)Opening it up for review for now, but will be working on another PR shortly with the most important bcftools functions.This (at the moment) is the first of the pysam dispatch wrappers for fgpyo. More to do here and I want to do a batch of the ones most important to us so this is a draft for now.