-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC]: Add functions guess_parse
and guess_alphabet
#292
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #292 +/- ##
==========================================
- Coverage 90.87% 90.73% -0.14%
==========================================
Files 31 31
Lines 2400 2419 +19
==========================================
+ Hits 2181 2195 +14
- Misses 219 224 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
8ced555
to
ee245ee
Compare
In terms of name, if the goal is for easy typing, I think underscores are a no no (for me personally). I do like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dig it! The bit shifting stuff is inscrutable to me, but I like the tests, and the caveats are sufficiently documented IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meant to approve
Like Kevin, I can't say I feel like I'm much help on the bit-code, but the rest of this looks great and I'm very excited about this functionality. I don't know if I'm the only one, but it feels like so few bioinformatics tools do any pre-processing validation of fasta files on their own. The number of hours I've wasted debugging code when someone throws a protein fasta into a collection of DNA fastas and uses |
Thanks for your inputs! I'd like to merge soon. |
It's possible we might want a real API at some point to detect compatible alphabets for a given input, but it's not trivial: * What do we do about user-defined alphabets? * Can we accept parsing the whole sequence twice - once to detect the alphabet, and once to construct the sequence? * Might there be some downstream problems caused by giving the users functions to create type instability in their packages?
3ae5494
to
a9a233e
Compare
This PR creates the functions
guess_parse
andguess_alphabet
, which infers an appropriate alphabet for the sequence:Notes for reviewers
guess"TAGTGCA"
or whatever?Closes #268
Does not close the similar #224