Skip to content

Commit

Permalink
Rename guess_parse to bioseq
Browse files Browse the repository at this point in the history
  • Loading branch information
jakobnissen committed Jan 21, 2024
1 parent ee245ee commit d0fbb6c
Show file tree
Hide file tree
Showing 6 changed files with 16 additions and 16 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [3.2.0]
* Add functions `guess_parse` and `guess_alphabet` to easily construct a biosequence
* Add functions `bioseq` and `guess_alphabet` to easily construct a biosequence
of an unknown alphabet from e.g. a string.
* Relax requirement of `decode`, such that it no longer needs to check for
invalid data. Note that this change is not breaking, since it is not possible
Expand Down
8 changes: 4 additions & 4 deletions docs/src/construction.md
Original file line number Diff line number Diff line change
Expand Up @@ -303,11 +303,11 @@ the bodies of things like for loops. And if you use them and are unsure, use the
```

## Loose parsing
As of version 3.2.0, BioSequences.jl provide the [`guess_parse`](@ref) function, which can be used to build a `LongSequence`
As of version 3.2.0, BioSequences.jl provide the [`guess`](@ref) function, which can be used to build a `LongSequence`
from a string (or an `AbstractVector{UInt8}`) without knowing the correct `Alphabet`.

```jldoctest
julia> guess_parse("ATGTGCTGA")
julia> guess("ATGTGCTGA")
9nt DNA Sequence:
ATGTGCTGA
```
Expand All @@ -316,7 +316,7 @@ The function will prioritise 2-bit alphabets over 4-bit alphabets, and prefer sm
If the input cannot be encoded by any of the built-in alphabets, an error is thrown:

```jldoctest
julia> guess_parse("0!(CC!;#&&%")
julia> bioseq("0!(CC!;#&&%")
ERROR: cannot encode 0x30 in AminoAcidAlphabet
[...]
```
Expand All @@ -325,7 +325,7 @@ Note that this function is only intended to be used for interactive, ephemeral w
The function is necessarily type unstable, and the precise returned alphabet for a given input is a heuristic which is subject to change.

```@docs
guess_parse
bioseq
guess_alphabet
```

Expand Down
2 changes: 1 addition & 1 deletion src/BioSequences.jl
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ export
###

guess_alphabet,
guess_parse,
bioseq,

# Types & aliases
Alphabet,
Expand Down
2 changes: 1 addition & 1 deletion src/alphabet.jl
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,7 @@ pick the first from the order below (i.e. `DNAAlphabet{2}()` if possible, otherw
5. `AminoAcidAlphabet()`
!!! warning
The functions `guess_parse` and `guess_alphabet` are intended for use in interactive
The functions `bioseq` and `guess_alphabet` are intended for use in interactive
sessions, and are not suitable for use in packages or non-ephemeral work.
They are type unstable, and their heuristics **are subject to change** in minor versions.
Expand Down
14 changes: 7 additions & 7 deletions src/longsequences/constructors.jl
Original file line number Diff line number Diff line change
Expand Up @@ -88,35 +88,35 @@ end
Base.parse(::Type{LongSequence{A}}, seq::AbstractString) where A = LongSequence{A}(seq)

"""
guess_parse(s::Union{AbstractString, AbstractVector{UInt8}}) -> LongSequence
bioseq(s::Union{AbstractString, AbstractVector{UInt8}}) -> LongSequence
Parse `s` into a `LongSequence` with an appropriate `Alphabet`, or throw an exception
if no alphabet matches.
See [`guess_alphabet`](@ref) for the available alphabets and the alphabet priority.
!!! warning
The functions `guess_parse` and `guess_alphabet` are intended for use in interactive
The functions `bioseq` and `guess_alphabet` are intended for use in interactive
sessions, and are not suitable for use in packages or non-ephemeral work.
They are type unstable, and their heuristics **are subject to change** in minor versions.
# Examples
```jldoctest
julia> guess_parse("QMKLPEEFW")
julia> bioseq("QMKLPEEFW")
9aa Amino Acid Sequence:
QMKLPEEFW
julia> guess_parse("UAUGCUGUAGG")
julia> bioseq("UAUGCUGUAGG")
11nt RNA Sequence:
UAUGCUGUAGG
julia> guess_parse("PKMW#3>>0;kL")
julia> bioseq("PKMW#3>>0;kL")
ERROR: cannot encode 0x23 in AminoAcidAlphabet
[...]
```
"""
function guess_parse(s::AbstractVector{UInt8})
function bioseq(s::AbstractVector{UInt8})
A = guess_alphabet(s)
A isa Integer && throw(EncodeError(AminoAcidAlphabet(), s[A]))
LongSequence{typeof(A)}(s)
end
guess_parse(s::AbstractString) = guess_parse(codeunits(s))
bioseq(s::AbstractString) = bioseq(codeunits(s))
4 changes: 2 additions & 2 deletions test/alphabet.jl
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ end
for S in Ss
for T in [String, SubString, Vector{UInt8}, Test.GenericString]
@test guess_alphabet(T(S)) == A
@test guess_parse(T(S)) isa LongSequence{typeof(A)}
@test bioseq(T(S)) isa LongSequence{typeof(A)}
end
end
end
Expand All @@ -209,7 +209,7 @@ end
]
for T in [String, SubString, Vector{UInt8}, Test.GenericString]
@test guess_alphabet(T(S)) == index
@test_throws BioSequences.EncodeError guess_parse(T(S))
@test_throws BioSequences.EncodeError bioseq(T(S))
end
end
end

0 comments on commit d0fbb6c

Please sign in to comment.