Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

homopolymer indel weighting #11

Open
nikosdarzentas opened this issue Dec 10, 2019 · 2 comments
Open

homopolymer indel weighting #11

nikosdarzentas opened this issue Dec 10, 2019 · 2 comments

Comments

@nikosdarzentas
Copy link

nikosdarzentas commented Dec 10, 2019

Hi Nick,

This is my 3rd report and the most important so far. It's about homopolymers, on Version 2.6.5 (all the issues I opened are on this version).

In the example below I would expect the homopolymer-based indel (two extra 't' in 1st sequence - and I guess the minimum homopolymer length is 3?) to be called (i.e. sequences clustered), because my understanding is that it should be weighted down to below 1, in particular 1/((5+3)/2) = 1/4 = 0.25 based on the manuscript - or 0.5 if you're counting this as 2 events?

Any insights?

stopCheck:smallCutoff:1baseIndel:2baseIndel:>2baseIndel:HQMismatches:LQMismatches:LKMismatches
9999999:0:1:0:0:0:0:0

@1;size=1
acgtttttACGTacgt
+
********HHHH****
@2;size=1
acgtttACGTacgt
+
******HHHH****

SeekDeep qluster --fastq qlutest.fastq --out qlutest --par myPar --noMarkChimeras --lower keep --caseInsensitive --smallReadSize 0 --useAllInput --writeOutInitalSeqs --overWrite --verbose --fastClustering --nucCutOff 0.2 --runCutOff 0%,0 --adjustHomopolyerRuns false --qualThresWindow 0

Thank you!

@nickjhathaway
Copy link
Member

So honestly I'm not sure what will happen with the lower case, I've never tested the algorithm with having both lower case and upper case I have always either removed the lower case or changed it to upper case so I'm not entirely sure what will happen. If you want to send me an example file and what you hope to get out of the clustering I might be able to tell what to do or what flags to set and since I'm about to release a new version I could see if it would be easy to tweak it to allow the mix case.

Nick

@nikosdarzentas
Copy link
Author

Thanks Nick.
I converted to uppercase, and I still couldn't make it work, even trying a longer homopolymer and equal qualities.

FYI, I use different cases and qualities to encode an antigen receptor rearrangement junction, and to then use SeekDeep's parameter profiles and case sensitivity to fully control what happens where.

And I'd rather make it work on such mini-examples, because I'd then like to manually edit the sequences to recreate and test different scenarios on the fly. In this case, I'd like to see how homopolymer weighting works, e.g. if it can absorb 2 extra homopolymer bases with 1-base indel allowed (to keep other indels under control).

If what I'm doing doesn't make sense, then that's OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants