-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A strange behavior in approximate searches #302
Comments
Another example in mismatches number is 2: findnext(query, 2, seq, 6)
8:9
the correct answer should be 8:11,not 8:9 |
This is because insertions/deletions are also accepted. So, I the first example, 1:1 is correct because it matches A--- |
But I tried running through all the indexes using the findnext function, but I couldn't find the answer I expected 2:5, how can I find it? for i in 1:length(seq)
println(findnext(query,3,seq,i))
end
1:1
3:3
3:3
4:4
5:6
6:6
8:8
8:8
9:9
nothing
nothing |
Ok, I wrote a small script that can achieve what I want. It's a really stupid algorithm that slides through long sequences and identifies the presence of inconsistent nucleotides one after the other. function AproximateMatch(longSeq::LongSequence{DNAAlphabet{4}},shortSeq::LongSequence{DNAAlphabet{4}},max_mismatches::Int64) :: Vector{Int64}
matches = Vector{Int64}()
len_long = length(longSeq)
len_short = length(shortSeq)
for i = 1: (len_long - len_short + 1)
mismatches = 0
for j in eachindex(shortSeq)
if !(iscompatible(longSeq[i + j - 1],shortSeq[j]))
mismatches += 1
if mismatches > max_mismatches
break
end
end
end
if mismatches == max_mismatches
push!(matches,i)
end
end
matches
end |
In the official example, if only 1bp of mismatch is allowed, the result is completely correct. However, when I increase the number of mismatches to 3bp, strange results appear. In fact if 3bp mismatch is allowed, we can manually find similar correct positions, e.g.
In the above case, the expected answer should be 2:5, not 1:1.
The text was updated successfully, but these errors were encountered: