Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower depth recorded with min_identity=0.0 than with 0.1 #7

Open
jakobnissen opened this issue Apr 25, 2022 · 6 comments
Open

Lower depth recorded with min_identity=0.0 than with 0.1 #7

jakobnissen opened this issue Apr 25, 2022 · 6 comments

Comments

@jakobnissen
Copy link
Collaborator

jakobnissen commented Apr 25, 2022

Here is a minimal working example. I am using the attached file head2.bam with only 4 reads and 1 reference. Note that in order to upload to GitHub, the files must be gzipped - simply unzip the attached file to get the original BAM file.

Note that the depth INCREASES when increasing min_identity. Perhaps it somehow defaults min_identity=0.0 to min_identity=0.97?

>>> import pycoverm
>>> pycoverm.get_coverages_from_bam(["head2.bam"], min_identity=0.0)[1]
array([[0.]], dtype=float32)
>>> pycoverm.get_coverages_from_bam(["head2.bam"], min_identity=0.1)[1]
array([[0.00310749]], dtype=float32)

head2.bam.gz

@apcamargo
Copy link
Owner

Thanks for the report! I couldn't find anything obvious just by looking at the code. It doesn't seem like vanilla CoverM will default min_identity=0.0 to something higher. I'll try to look into this soon

@apcamargo
Copy link
Owner

I think I know what is going on. CoverM is checking whether the minimum identity is greater than 0.

I think the easiest solution here would be to set min_identity to something extremely small if the user sets it to zero. Would that work to you, @jakobnissen?

@apcamargo
Copy link
Owner

I couldn't figure out what CoverM does when the minimum identity is 0. The results it provides are correct, though. Pinging @wwood

@jakobnissen
Copy link
Collaborator Author

It's no rush for me to get it fixed - I set min_identity to 0.001 if the user sets it to 0. But it would be good to know why pycoverm behaves this way.

@wwood
Copy link

wwood commented May 5, 2022

Hi,

Sorry I'm not really clear on the specifics here, but some notes

CoverM doesn't give those answers,

(coverm-dev)b2:20220505:~/git/coverm-local$ cargo run -- contig -b head2.bam 
    Finished dev [unoptimized + debuginfo] target(s) in 0.06s
     Running `target/debug/coverm contig -b head2.bam`
[2022-05-05T05:53:39Z INFO  bird_tool_utils::clap_utils] CoverM version 0.6.1
[2022-05-05T05:53:39Z INFO  coverm] Using min-covered-fraction 0%
[2022-05-05T05:53:40Z INFO  coverm::contig] In sample 'head2', found 4 reads mapped out of 4 total (100.00%)
Contig	head2 Mean
S5C6	0.0031074875

(coverm-dev)b2:20220505:~/git/coverm-local$ cargo run -- contig -b head2.bam --min-read-percent-identity 0.1
    Finished dev [unoptimized + debuginfo] target(s) in 0.06s
     Running `target/debug/coverm contig -b head2.bam --min-read-percent-identity 0.1`
[2022-05-05T05:54:15Z INFO  bird_tool_utils::clap_utils] CoverM version 0.6.1
[2022-05-05T05:54:15Z INFO  coverm] Using min-read-percent-identity 10%
[2022-05-05T05:54:15Z INFO  coverm] Using min-covered-fraction 0%
[2022-05-05T05:54:15Z INFO  coverm::contig] In sample 'head2', found 4 reads mapped out of 4 total (100.00%)
Contig	head2 Mean
S5C6	0.0031074875

(coverm-dev)b2:20220505:~/git/coverm-local$ cargo run -- contig -b head2.bam --min-read-percent-identity 0.0
    Finished dev [unoptimized + debuginfo] target(s) in 0.06s
     Running `target/debug/coverm contig -b head2.bam --min-read-percent-identity 0.0`
[2022-05-05T05:57:26Z INFO  bird_tool_utils::clap_utils] CoverM version 0.6.1
[2022-05-05T05:57:26Z INFO  coverm] Using min-read-percent-identity 0%
[2022-05-05T05:57:26Z INFO  coverm] Using min-covered-fraction 0%
[2022-05-05T05:57:26Z INFO  coverm::contig] In sample 'head2', found 4 reads mapped out of 4 total (100.00%)
Contig	head2 Mean
S5C6	0.0031074875

Maybe the default is being put in here? Not very familiar with the python side sorry.

min_identity = "0.97",

ben

@apcamargo
Copy link
Owner

Thanks, @wwood. I also noticed that CoverM returns the correct coverage, I'm still not sure what is going on. Might be something on the Python/Maturin side of things, but it's very strange that it only happens when min_identity is 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants