Potential bug in compute_fraglen.py #99

mrmilhaven · 2021-10-20T18:21:38Z

Describe the bug
A clear and concise description of what the bug is.
Fragment Length Distribution is incorrect, even when using NEAT's gen_reads.py

To Reproduce
Steps to reproduce the behavior:

Create NEAT dataset with fragment length distribution as mean = 300 and SD = 10 (test.fa is 1Mb of random nucleotides, no Ns):
python ~/NEAT/gen_reads.py -R 100 -r test.fa -c 10 --pe 300 10 -o NEAT_test

2.Map reads (I used BWA):
bwa mem -M test.fa NEAT_test_read1.fq.gz NEAT_test_read2.fq.gz > NEAT_test.bam

Compute fragment distribution as sanity check:
python ~/NEAT/utilities/compute_fraglen.py -i NEAT_test.bam -o NEAT_test
open pickle file:
import pandas
pandas.read_pickle("NEAT_test.p")
[[100], [1.0]]

Expected behavior
I would expect the same distribution that I gave gen_reads.py (mean of 300 with SD of 10) to come back out.

Potential Reason
I think the bug has to do with the count_frags method:
First 9 fields in BAM from BAM file:
NEAT_test-test-1 99 test 9039 60 100M = 9254 315

First 9 fields of BAM from script(adding print line within count_frags for loop, approx. line 96):
NEAT_test-test-1 99 0 9038 60 100M 0 9253 100

I believe that converting from an iterable pysam type to a string (line 95) causes some of the fields to change. This also implies that , since mate mapping(field 8) is always equal to the reference (field 3), every read will look like it was mate mapped, per your code (line 108).

I think this can be fixed by using pysams inherent parameters, such as item.mate_is_unmapped or item.template_length instead of splitting the line by tabs.

joshfactorial · 2021-10-20T18:34:53Z

Can you duplicate this bug on our main development repo: https://github.com/ncsa/NEAT/issues? That will help us keep things together. I'm currently investigating those utilities, so this is timely.

mrmilhaven · 2021-10-20T18:39:05Z

Duplicated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bug in compute_fraglen.py #99

Potential bug in compute_fraglen.py #99

mrmilhaven commented Oct 20, 2021 •

edited

Loading

joshfactorial commented Oct 20, 2021

mrmilhaven commented Oct 20, 2021

Potential bug in compute_fraglen.py #99

Potential bug in compute_fraglen.py #99

Comments

mrmilhaven commented Oct 20, 2021 • edited Loading

joshfactorial commented Oct 20, 2021

mrmilhaven commented Oct 20, 2021

mrmilhaven commented Oct 20, 2021 •

edited

Loading