-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ispalindromic
with respect to reverse
or reverse_complement
?
#303
Comments
I've noticed that the fuzzer I've been using above is a bit too friendly, only producing inputs with even length. Adjusted for that, julia> using Supposition, BioSequences
julia> dna_string = Data.Text(Data.SampledFrom(('A','C','T','G')));
julia> @check (s=dna_string, m=Data.Just("") | Data.SampledFrom(('A','C','T','G'))) -> begin
b = LongDNA{4}(s)
c = b*LongDNA{4}(m)*reverse_complement(b)
event!("Complete sequence", c)
ispalindromic(c)
end;
Events occured: 1
Complete sequence
A
┌ Error: Property doesn't hold!
│ Description = "##SuppositionAnon#560"
│ Example = (s = "", m = 'A')
└ @ Supposition ~/.julia/packages/Supposition/KpGkN/src/testset.jl:292
Test Summary: | Fail Total Time
##SuppositionAnon#560 | 1 1 0.0s I'm not a Bioinformatician so I don't really know what the exact semantics of
Could also be an opportunity for some optimization, since the length is cached as far as I can tell. |
You're right - palindromic sequences are defined in terms of reverse-complementation since this is more relevant biologically (nucleotide reversion doesn't occur in nature, but reverse-comeplementation happens all the time). It's also true that odd-length sequences can never be palindromic. I recall vaguely that this is leveraged in genome assemblers, that only use odd-length kmers, which simplifies the implementation of strand-specific assembly de Bruijn graphs. A documentation update is in order. The current docstring is not very.. explanatory. |
Ah, the tests has a counterexample with a palindromic odd-length sequence: |
A case that my very naive fuzzing attempt with |
Let's be honest - good job @jakobnissen and @bicycle1885 :-P |
Yes indeed - thanks too for the quick turnaround here! |
I've been trying to fuzz the parsers of BioSequences a bit using Supposition.jl, making use of the very nice interfaces you have (I haven't found any actual bugs so far, and performance is absolutely stellar even for humongous inputs, good job everyone! :D) and came across this:
Now, I'm not a Bioinformatics guy so this may just be some insider knowledge I don't have, and the docstring of
ispalindromic
doesn't say, but which ofreverse
andreverse_complement
isispalindromic
referring to when it says that the sequence is palindromic? The current implementation is consistent withreverse_complement
, but not withreverse
.If this is just missing from the docs, something like this should clear it up:
This was found using the following fuzzing setup:
the complementary construction with
reverse_complement
passes successfully (it tries 10_000 sequences by default, so that's why it takes almost a second to run through):The text was updated successfully, but these errors were encountered: