Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error message for unsorted BAM #1

Open
jakobnissen opened this issue Dec 16, 2020 · 2 comments
Open

Better error message for unsorted BAM #1

jakobnissen opened this issue Dec 16, 2020 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@jakobnissen
Copy link
Collaborator

jakobnissen commented Dec 16, 2020

Dear @apcamargo

Thanks for the heads-up on this package. It looks great! I'm giving it a test spin now.
When passing a non-sorted BAM:

>>> pycoverm.get_coverages_from_bam(["out.bam"], threads=4)
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: ShapeError/IncompatibleShape: incompatible shapes', src/lib.rs:181:6
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: ShapeError/IncompatibleShape: incompatible shapes

A better error-message would be preferrable (I would perhaps do a search-and-replace for unwrap() and replace it with expect("<NEW ERROR MESSAGE>"))

@apcamargo
Copy link
Owner

apcamargo commented Dec 16, 2020

Yes, I still need to implement a function to check if BAMs are sorted. I'll make it raise an exception.

The reason I haven't implemented this yet is because I'm doing this check in Python (see below), but I agree with you that pyCoverM should be able to do it within itself to avoid unnecessary dependencies.

from Bio import bgzf

def is_bam_sorted(filepath):
    """
    Checks if a BAM file is sorted by coordinate.
    Parameters
    ----------
    filepath : Path
        Path object pointing to BAM file.
    Returns
    -------
    bool
        Returns `True` if the BAM file is sorted by coordinate and `False`
        otherwise.
    """
    with bgzf.BgzfReader(filepath, "rb") as fin:
        bam_header = fin.readline().strip()
        return b"SO:coordinate" in bam_header

@apcamargo apcamargo added the enhancement New feature or request label Dec 16, 2020
@apcamargo
Copy link
Owner

apcamargo commented Dec 16, 2020

I just pushed some commits that add the is_bam_sorted, which can be used to check if the BAMs are sorted before inputting them to get_coverages_from_bam.

To check multiple files:

if not all(pycoverm.is_bam_sorted(bam_path) for bam_path in bam_file_list):
    raise Exception(
        "At least one of the supplied BAM files is not sorted by coordinate "
        "or does not contain a valid header. You can sort it using `samtools sort`."
    )

I'll leave the issue open because this function doesn't fix it properly. When I get some time I'll make get_coverages_from_bam raise an exception if at least one of the input BAM files is unsorted.

@apcamargo apcamargo added the good first issue Good for newcomers label Dec 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants