Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

picard tools' FixMates returns error after spike-in and post-processing #162

Open
jmal0403 opened this issue Jun 22, 2020 · 2 comments
Open

Comments

@jmal0403
Copy link

Hi Adam,

Hope you are well. We have corresponded in the past. Our group at Penn has been using Bamsurgeon with success for the past five years for our Alzheimer's disease sequence data. We are spiking mutations (DEL, INS, DUP, and INV) into our biological replicates (same samples, different sequencing center) to measure SV calling sensitivity. Previously, we had run into a minor issue where some reads where mis-aligned after post-processing. Specifically, the error was: "Mate alignment does not match alignment start of mate." We were able to run picard's FixMates to move past the issue. Now, upon running FixMates, we are seeing the following error: "Found two records that are paired, not supplementary, and second of the pair." This is causing the FixMates function to crash. We are using the newest version. Our data are Illumunia Hi-Seq 2500 reads. The only difference is that we have additional sample replicates, as we have many additional thousands of samples.

I have included a sample read (offender) below:

HWI-ST1324:173:C25WVACXX:5:1101:10285:52384 163 chr15_KI270905v1_alt 1399574 0 100M = 1399968 495 GGGTCAAGTGGTGTCCCAGGTCCCACCCTGACACAGTGCAATGGGCCAGTGGCTCAAAGAAAGCCCAGCACCCTTCATGGGAATTCCCACCCTCACCTGA 555????????????????????????????????????????????????????????????I?????????I????I?II?II??I??III?IIII?? XT:Z:R NM:i:0 SM:i:0 AM:i:0 X0:i:2 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:100
XA:Z:chr15,+29124945,100M,0; RG:Z:0.2

Error message:

Found two records that are paired, not supplementary, and second of the pair: HWI-ST1324:173:C25WVACXX:5:1101:10285:52384

On behalf of the ADSP, any help would be most appreciated. Please let me know if you need any more information. I'm glad to help in any way.

Very Best,
John Stephen Malamon, PhD
University of Pennsylvania

@adamewing
Copy link
Owner

Hi John,

Apologies for the delayed response. Are you able to pin down exactly which SV is causing this?

There is a "post processing" script included with bamsurgeon (scripts/postprocess.py) that is supposed to fix validation issues but it's usually more trouble than it's worth to run it.

--Adam

@jmal0403
Copy link
Author

Hi Adam,

Thanks for the response. I hope you are well. I finally had a chance to look into the issues some more. I greped the BAM file for the read flagged by GATK before post-processing. In other words, straight out of spike-in. There are three reads for the offender:

HWI-ST896:365:D2AAAACXX:5:2214:7450:43788 163 chr1 1057498 60 100M = 1057893 496 TAATCCTAGCACTTTGGGAGGCCGAGGCGGGTGGATCACCTGAGGTCAGGAGTTCAAGACGAGCCTGGCCAACATGGTGAAACCCTGTCTCTCCTAAAAT 555??????????????????????????????????????????????????????????????I??I?I??I?II???II?IIII??I?III?II??? XT:Z:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:100 RG:Z:0

HWI-ST896:365:D2AAAACXX:5:2214:7450:43788 163 chr1 1057498 60 100M = 1057893 496 TAATCCTAGCACTTTGGGAGGCCGAGGCGGGTGGATCACCTGAGGTCAGGAGTTCAAGACGAGCCTGGCCAACATGGTGAAACCCTGTCTCTCCTAAAAT 555??????????????????????????????????????????????????????????????I??I?I??I?II???II?IIII??I?III?II??? XT:Z:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:100 RG:Z:0

INFOST892020-08-28 14:53:37214:7SamFileValidator83 Validated Read 90,000,000 records. Elapsed time: 00:09:30s. Time for last 10,000,000: 56s. Last read position: chr2:478,752ATCCCAGTTACTCGGGAGGCTGAGGC 5????IIII?IIIII?II?IIIII?IIIIII?????????????????????????????????????????????????????????????????????? XT:Z:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:101 RG:Z:0.1

I checked the pre-spike-in BAM using picard, and there were no errors. It seems like this would not be an issue if only the second two reads were present or the first were removed. Also, I couldn't find any event (DEL or INS) associated with this read. Can you think of why this may be occurring or how it can be fixed?

Thanks again for your help,
John

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants