You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
A clear and concise description of what the bug is.
Fragment Length Distribution is incorrect, even when using NEAT's gen_reads.py
To Reproduce
Steps to reproduce the behavior:
Create NEAT dataset with fragment length distribution as mean = 300 and SD = 10 (test.fa is 1Mb of random nucleotides, no Ns):
python ~/NEAT/gen_reads.py -R 100 -r test.fa -c 10 --pe 300 10 -o NEAT_test
2.Map reads (I used BWA):
bwa mem -M test.fa NEAT_test_read1.fq.gz NEAT_test_read2.fq.gz > NEAT_test.bam
Compute fragment distribution as sanity check:
python ~/NEAT/utilities/compute_fraglen.py -i NEAT_test.bam -o NEAT_test
open pickle file:
import pandas
pandas.read_pickle("NEAT_test.p")
[[100], [1.0]]
Expected behavior
I would expect the same distribution that I gave gen_reads.py (mean of 300 with SD of 10) to come back out.
Potential Reason
I think the bug has to do with the count_frags method:
First 9 fields in BAM from BAM file:
NEAT_test-test-1 99 test 9039 60 100M = 9254 315
First 9 fields of BAM from script(adding print line within count_frags for loop, approx. line 96):
NEAT_test-test-1 99 0 9038 60 100M 0 9253 100
I believe that converting from an iterable pysam type to a string (line 95) causes some of the fields to change. This also implies that , since mate mapping(field 8) is always equal to the reference (field 3), every read will look like it was mate mapped, per your code (line 108).
I think this can be fixed by using pysams inherent parameters, such as item.mate_is_unmapped or item.template_length instead of splitting the line by tabs.
The text was updated successfully, but these errors were encountered:
Can you duplicate this bug on our main development repo: https://github.com/ncsa/NEAT/issues? That will help us keep things together. I'm currently investigating those utilities, so this is timely.
Describe the bug
A clear and concise description of what the bug is.
Fragment Length Distribution is incorrect, even when using NEAT's gen_reads.py
To Reproduce
Steps to reproduce the behavior:
python ~/NEAT/gen_reads.py -R 100 -r test.fa -c 10 --pe 300 10 -o NEAT_test
2.Map reads (I used BWA):
bwa mem -M test.fa NEAT_test_read1.fq.gz NEAT_test_read2.fq.gz > NEAT_test.bam
Compute fragment distribution as sanity check:
python ~/NEAT/utilities/compute_fraglen.py -i NEAT_test.bam -o NEAT_test
open pickle file:
import pandas
pandas.read_pickle("NEAT_test.p")
[[100], [1.0]]
Expected behavior
I would expect the same distribution that I gave gen_reads.py (mean of 300 with SD of 10) to come back out.
Potential Reason
I think the bug has to do with the count_frags method:
First 9 fields in BAM from BAM file:
NEAT_test-test-1 99 test 9039 60 100M = 9254 315
First 9 fields of BAM from script(adding print line within count_frags for loop, approx. line 96):
NEAT_test-test-1 99 0 9038 60 100M 0 9253 100
I believe that converting from an iterable pysam type to a string (line 95) causes some of the fields to change. This also implies that , since mate mapping(field 8) is always equal to the reference (field 3), every read will look like it was mate mapped, per your code (line 108).
I think this can be fixed by using pysams inherent parameters, such as item.mate_is_unmapped or item.template_length instead of splitting the line by tabs.
The text was updated successfully, but these errors were encountered: