Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/ samtools typehinted wrappers #26

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

NatPRoach
Copy link
Contributor

@NatPRoach NatPRoach commented Dec 27, 2022

Closes: #22

EDIT: This is the first 4 type hinted wrappers around pysam dispatch functions. A lot more to do but I wanted to start with the functions most important to us.(Per conversation with Tim these 4 functions are the only ones we'll be wrapping for now.) Opening it up for review for now, but will be working on another PR shortly with the most important bcftools functions.

This (at the moment) is the first of the pysam dispatch wrappers for fgpyo. More to do here and I want to do a batch of the ones most important to us so this is a draft for now.

@codecov-commenter
Copy link

codecov-commenter commented Dec 27, 2022

Codecov Report

Attention: Patch coverage is 94.11765% with 21 lines in your changes missing coverage. Please review.

Project coverage is 93.09%. Comparing base (ca4a572) to head (c0d9e00).
Report is 40 commits behind head on main.

Files Patch % Lines
fgpyo/samtools.py 80.18% 12 Missing and 9 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #26      +/-   ##
==========================================
+ Coverage   93.01%   93.09%   +0.08%     
==========================================
  Files          30       32       +2     
  Lines        3119     3463     +344     
  Branches      579      636      +57     
==========================================
+ Hits         2901     3224     +323     
- Misses        145      157      +12     
- Partials       73       82       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@NatPRoach NatPRoach changed the title Feature/htslib typehinted wrappers Feature/ samtools typehinted wrappers Dec 30, 2022
@NatPRoach NatPRoach force-pushed the feature/htslib_typehinted_wrappers branch 2 times, most recently from 91d165b to d4c66a7 Compare December 30, 2022 15:04
@NatPRoach NatPRoach marked this pull request as ready for review December 30, 2022 15:42
@NatPRoach NatPRoach requested review from nh13 and tfenne December 30, 2022 15:42
@NatPRoach NatPRoach force-pushed the feature/htslib_typehinted_wrappers branch 2 times, most recently from 415df2c to f2b31bc Compare December 30, 2022 15:49
@NatPRoach NatPRoach changed the title Feature/ samtools typehinted wrappers Feature/ samtools typehinted wrappers batch 1 Dec 30, 2022
@NatPRoach NatPRoach changed the title Feature/ samtools typehinted wrappers batch 1 Feature/ samtools typehinted wrappers Dec 30, 2022
@NatPRoach NatPRoach force-pushed the feature/htslib_typehinted_wrappers branch 2 times, most recently from 85eb73c to 015aa54 Compare January 3, 2023 17:08
@TedBrookings TedBrookings force-pushed the feature/specify_sort_order_in_sambuilder branch 2 times, most recently from bd54520 to ae5b948 Compare November 14, 2023 21:12
Base automatically changed from feature/specify_sort_order_in_sambuilder to main November 14, 2023 21:35
@TedBrookings TedBrookings force-pushed the feature/htslib_typehinted_wrappers branch 2 times, most recently from c483a09 to 0ed945c Compare November 15, 2023 16:38
* Rebased onto current main
* Update argument names to avoid shadowing "input"
* Allow samtools index for multiple paths
@TedBrookings TedBrookings force-pushed the feature/htslib_typehinted_wrappers branch from 0ed945c to c0d9e00 Compare November 15, 2023 16:45
@clintval clintval self-assigned this Nov 21, 2023
Copy link
Member

@clintval clintval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviving this PR @TedBrookings!

I haven't looked at the tests yet, but I do see many new code paths/branches which are untested. Could you add additional tests where it makes sense to ensure we have coverage over important codepaths?

Most of my comments are about the API and documentation.

Could you also update the PR title to be more descriptive so it will be easier to understand what this commit does once squashed to main?

I'll do another end-to-end review (including tests) after you've taken another pass at the PR!

@@ -198,13 +198,14 @@ class SamFileType(enum.Enum):
ext (str): The standard file extension for this file type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add index_ext to attributes docs.

self.mode = mode
self.extension = ext
self.index_extension = index_ext
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer keeping the parameter and attribute names the same as it makes things more obvious/readable:

Suggested change
self.index_extension = index_ext
self.index_extension = index_extension

Although I see the convention has already broken for ext/extension. 🤔

@@ -911,3 +912,5 @@ class SamOrder(enum.Enum):
Coordinate = "coordinate" #: coordinate sorted
QueryName = "queryname" #: queryname sorted
Unknown = "unknown" # Unknown SAM / BAM / CRAM sort order
# Sort by template-coordinate, SO is unsorted, GO is query, SS is template-coordinate:
TemplateCoordinate = "unsorted"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move/upgrade comments to become attributes docs in the main class docstring for auto documentation?

from fgpyo.sam import SamOrder


def pysam_fn(*args: str) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be a part of the public API?

Is if TYPE_CHECKING preferred over making a stub or .pyi file? (I'm not an expert on this pattern). Seems like making some stubs for the simple pysam calls would be nicer than having this boilerplate in our library code... let me know what is the right path forward here, I'm curious!


def dict(
fasta_file: Path,
output: Path,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input is descriptive fasta_file but the output output is not.

I would suggest input/output or fasta/dict (the types already indicate it is a Path).

kmer_size: the kmer-size to be used if sorting unmapped reads.
compression_level: The compression level to be used in the final output file.
memory_per_thread: Approximately the maximum required memory per thread, specified either
in bytes or with a K, M, or G suffix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suffixes include M as valid yet the CLI signature default has MB?

sort_unmapped_reads: bool = False,
kmer_size: int = 20,
compression_level: Optional[int] = None,
memory_per_thread: str = "768MB",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this default come from?

Arguments will be formatted into the appropriate command-line call which will be invoked
using the pysam dispatcher.

Returns the stdout of the resulting call to pysam.sort()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Google style docstrings, use the Returns: block like you would an Args: or Attributes.

Only making this comment here, but it applies to all other docstrings too.

https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html

if index_type == SamIndexType.CSI:
args.extend(["-m", str(csi_index_size)])
args.extend(["-@", str(threads)])
args.extend(f"{input_path}" for input_path in alignment_file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting of when you add a newline whitespace between code vs choosing not to is changing throughout the PR. I'd recommend keeping the style consistent.

@@ -0,0 +1,375 @@
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this file automatically show up in generated docs?

I know we'll eventually need to fixup the VCF work so it is indexed: #72

@clintval clintval assigned TedBrookings and unassigned clintval Nov 22, 2023
@nh13 nh13 added the help wanted Extra attention is needed label Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create methods and type stubs on pysam htslib C extension calls
5 participants