Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: quality and sequence mismatch: 98 != 97 #204

Open
mkankain-hruh opened this issue May 20, 2022 · 5 comments
Open

ValueError: quality and sequence mismatch: 98 != 97 #204

mkankain-hruh opened this issue May 20, 2022 · 5 comments

Comments

@mkankain-hruh
Copy link

mkankain-hruh commented May 20, 2022

Bamsurgeon downloaded 20/05/2022. Using pysam 0.19.0 py39h5030a8b_0 from bioconda. Below last log lines before crash. Inputing variants to ultradeep BAM file with mean x500

INFO 2022-05-20 02:16:51,517 secondary reads count:0
INFO 2022-05-20 02:16:51,517 supplementary reads count:0
INFO 2022-05-20 02:16:51,517 loaded 12686 reads, (0 excluded, 0 null or secondary or supplementary--> ignored)

Traceback (most recent call last):
File "/usr/local/source/bamsurgeon/bin/addsnv.py", line 479, in
run()
File "/usr/local/source/bamsurgeon/bin/addsnv.py", line 476, in run
main(args)
File "/usr/local/source/bamsurgeon/bin/addsnv.py", line 422, in main
rr.replace_reads(args.bamFileName, outbam_mutsfile, args.outBamFile, keepqual=True, seed=args.seed)
File "/usr/local/source/bamsurgeon/bin/bamsurgeon/replace_reads.py", line 165, in replace_reads
rdict[extqname].qual = read.qual
File "pysam/libcalignedsegment.pyx", line 2772, in pysam.libcalignedsegment.AlignedSegment.qual.set
File "pysam/libcalignedsegment.pyx", line 1514, in pysam.libcalignedsegment.AlignedSegment.query_qualities.set
ValueError: quality and sequence mismatch: 98 != 97

@drmrgd
Copy link

drmrgd commented Aug 12, 2022

I am getting a very similar error too, which I think may be related to an issue matching the quality string and sequence string for mate pairs:

INFO 2022-08-11 21:18:20,977 loading donor reads into dictionary...

INFO 2022-08-11 21:18:21,042 secondary reads count:0
INFO 2022-08-11 21:18:21,042 supplementary reads count:0
INFO 2022-08-11 21:18:21,042 loaded 28448 reads, (0 excluded, 0 null or secondary or supplementary--> ignored)

Traceback (most recent call last):
  File "/Users/jg7758/Dropbox/biofx_tools/bamsurgeon/bin/addindel.py", line 389, in <module>
    run()
  File "/Users/jg7758/Dropbox/biofx_tools/bamsurgeon/bin/addindel.py", line 386, in run
    main(args)
  File "/Users/jg7758/Dropbox/biofx_tools/bamsurgeon/bin/addindel.py", line 334, in main
    rr.replace_reads(args.bamFileName, outbam_mutsfile, args.outBamFile, keepqual=True, seed=args.seed)
  File "/Users/jg7758/OneDrive - Exact Sciences/biofx_tools/bamsurgeon/bin/bamsurgeon/replace_reads.py", line 165, in replace_reads
    rdict[extqname].qual = read.qual
  File "pysam/libcalignedsegment.pyx", line 2772, in pysam.libcalignedsegment.AlignedSegment.qual.__set__
  File "pysam/libcalignedsegment.pyx", line 1514, in pysam.libcalignedsegment.AlignedSegment.query_qualities.__set__
ValueError: quality and sequence mismatch: 90 != 74

When I tried to look into this, I found it interesting that the lengths correspond to each read in a mate pair in a trans way. For example, in my original BAM, I have this read pair:

220211_A00995_0203_BHW2YGDRXY-Crick:2:2101:4182:26083	163	chr10	89692878	60	90M	=	89692894	90	CAATTCACTGTAAAGCTGGAAAGGGACGAACTGGTGTAATGATATGTGCATATTTATTACATCGGGGCAAATTTTTAAAGGCACAAGAGG	@DFEEHDDCEAAFEF?7>DEG+GFEBE9DECIGDAGB?EEGDCAF>@EFCE?DEFACFABC9F8DEDEC9)9FFFE)77G:FCD8EECEF	ZA:i:0	MB:Z:TCTTGATCGTACAC	ZB:i:11	MC:Z:74M	ZC:f:0.108911	MD:Z:90	PG:Z:bwa	NM:i:0	MQ:i:60OQ:Z:FFFFFFFF:FFFFFF:,:FFF,FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFF:,:FFFF,::F:FFF:FFFFFFFF:F:,:FF:	UQ:i:0	AS:i:90	CS:Z:GCCGTCGTTTTAT	XS:i:75	QX:Z:FFFFFFFFFFFFFF-FFFFFFFFFFFFF	RX:Z:TCTTGATCGTACAC-GCCGTCGTTTTAT
220211_A00995_0203_BHW2YGDRXY-Crick:2:2101:4182:26083	83	chr10	89692894	60	74M	=	89692878	-90	TGGAAAGGGACGAACTGGTGTAATGATATGTGCATATTTATTACATCGGGGCAAATTTTTAAAGGCACAAGAGG	CDGFGHCDFB;GFBGDD:EDAHDEGEAEDDEGH:BCFFAEEBBHCE2DDD@H:G;EFGE:HH7CGHAHF8EHB9	MB:Z:TCTTGATCGTACAC	MC:Z:90M11S	MD:Z:74	PG:Z:bwa   	NM:i:0	MQ:i:60	OQ:Z:FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFF:F:F:FFFF:FF,FFFFFF,FFF:	UQ:i:0	AS:i:74	CS:Z:GCCGTCGTTTTAT	XS:i:59	QX:Z:FFFFFFFFFFFFFF-FFFFFFFFFFFFF	RX:Z:TCTTGATCGTACAC-GCCGTCGTTTTAT

The first is 90 bases long and the second is 74, and the sequence length and quality length are correct. In the temporary mutation BAM that is being compared to the original BAM, I can see that the sequence strings look correct (i.e. the 5bp deletion I requested is in the correct location), the CIGAR is updated correctly, and the string lengths look correct:

samtools view addindel.48284eaa-12e3-47fd-b813-277a15003e8d.muts.bam | grep 220211_A00995_0203_BHW2YGDRXY-Crick:2:2101:4182:26083
220211_A00995_0203_BHW2YGDRXY-Crick:2:2101:4182:26083	163	chr10	89692878	60	48M5D42M	=	89692894	95	CAATTCACTGTAAAGCTGGAAAGGGACGAACTGGTGTAATGATATGTGTTATTACATCGGGGCAAATTTTTAAAGGCACAAGAGGCCCTA	@DFEEHDDCEAAFEF?7>DEG+GFEBE9DECIGDAGB?EEGDCAF>@EFCE?DEFACFABC9F8DEDEC9)9FFFE)77G:FCD8EECEF	NM:i:5	MD:Z:48^CATAT42	MC:Z:32M5D42M	MQ:i:60	AS:i:79	XS:i:64	XA:Z:chr9,-33676194,42M5D48M,8;BS:i:1
220211_A00995_0203_BHW2YGDRXY-Crick:2:2101:4182:26083	83	chr10	89692894	60	32M5D42M	=	89692878	-95	TGGAAAGGGACGAACTGGTGTAATGATATGTGTTATTACATCGGGGCAAATTTTTAAAGGCACAAGAGGCCCTA	CDGFGHCDFB;GFBGDD:EDAHDEGEAEDDEGH:BCFFAEEBBHCE2DDD@H:G;EFGE:HH7CGHAHF8EHB9	NM:i:5	MD:Z:32^CATAT42	MC:Z:48M5D42M	MQ:i:60AS:i:63	XS:i:48	BS:i:1

So, I think that part is correct.

I dug into replace_reads.py, which is where the error is occurring I think, and it seems like it's choking on the very first read. With some debugging cod in there, I can see that the lengths within this record are correct:

~/Dropbox/biofx_tools/bamsurgeon/bin/bamsurgeon/replace_reads.py -b nosc.bam -r addindel.48284eaa-12e3-47fd-b813-277a15003e8d.muts.bam -o sim.bam --keepqual --progress
[E::idx_find_and_load] Could not retrieve index file for 'nosc.bam'
INFO 2022-08-12 08:57:39,936 loading donor reads into dictionary...

INFO 2022-08-12 08:57:40,017 secondary reads count:0
INFO 2022-08-12 08:57:40,017 supplementary reads count:0
INFO 2022-08-12 08:57:40,018 loaded 28448 reads, (0 excluded, 0 null or secondary or supplementary--> ignored)

new Reads -> []

Curr Read:
220211_A00995_0203_BHW2YGDRXY-Crick:2:2101:4182:26083	163	#10	89692878	60	90M	#10	89692894	90	CAATTCACTGTAAAGCTGGAAAGGGACGAACTGGTGTAATGATATGTGCATATTTATTACATCGGGGCAAATTTTTAAAGGCACAAGAGG	array('B', [31, 35, 37, 36, 36, 39, 35, 35, 34, 36, 32, 32, 37, 36, 37, 30, 22, 29, 35, 36, 38, 10, 38, 37, 36, 33, 36, 24, 35, 36, 34, 40, 38, 35, 32, 38, 33, 30, 36, 36, 38, 35, 34, 32, 37, 29, 31, 36, 37, 34, 36, 30, 35, 36, 37, 32, 34, 37, 32, 33, 34, 24, 37, 23, 35, 36, 35, 36, 34, 24, 8, 24, 37, 37, 37, 36, 8, 22, 22, 38, 25, 37, 34, 35, 23, 36, 36, 34, 36, 37])	[('ZA', 0), ('MB', 'TCTTGATCGTACAC'), ('ZB', 11), ('MC', '74M'), ('ZC', 0.10891088843345642), ('MD', '90'), ('PG', 'bwa'), ('NM', 0), ('MQ', 60), ('OQ', 'FFFFFFFF:FFFFFF:,:FFF,FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFF:,:FFFF,::F:FFF:FFFFFFFF:F:,:FF:'), ('UQ', 0), ('AS', 90), ('CS', 'GCCGTCGTTTTAT'), ('XS', 75), ('QX', 'FFFFFFFFFFFFFF-FFFFFFFFFFFFF'), ('RX', 'TCTTGATCGTACAC-GCCGTCGTTTTAT')]

Seq Len: 90
qual len: 90

So, I'm wondering if the issue comes from this record being compared to the mate pair and since the lengths of the two mates are different, this is causing the issue. Does that seem possible or am I on the wrong track? Is this something that you've seen before and have a suggestion on how to fix?

@drmrgd
Copy link

drmrgd commented Aug 12, 2022

One more thing of note: it seems to be related to newer code in this repo. When I downgraded and pulled bamsurgeon==1.3 just now to test, it seems to work. So, is there a parity check that's either causing an issue here, or detecting an issue with my input BAM?

@adamewing
Copy link
Owner

adamewing commented Aug 14, 2022

Hmm, will try to make a mixed-length .bam for testing and report back.

Thanks fot the details and debugging.

@adamewing
Copy link
Owner

This may have been related to #212, if still trying to use bamsurgeon let me know how you go with the latest updates.

@apraga
Copy link

apraga commented May 18, 2023

Hi, I'm have the same issue with both 1.4.1 and latest on the master branch. It does not happen with version 1.3, as mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants