All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Deprecate functions
n_ambiguous
,n_gaps
andn_certain
. Instead, use the equivalent methodscount(f, seq)
with the appropriate functionf
. - Deprecate method
Base.count(::Function, ::BioSequence, ::BioSequence)
, and the other methods ofcount
which are subtypes of this. - Deprecate use of functions
matches
andmismatches
where the input seqs have different lengths. - Optimise
count(==(biosymbol), biosequence)
andcount(==(biosymbol), biosequence)
- Optimise contruction of
LongSequence
nucleotide sequences from sequences with a different bit-number (e.g. two-bit seqs from four-bit seqs)
- Add functions
bioseq
andguess_alphabet
to easily construct a biosequence of an unknown alphabet from e.g. a string. - Relax requirement of
decode
, such that it no longer needs to check for invalid data. Note that this change is not breaking, since it is not possible for correctly-implementedAlphabet
andBioSequence
to store invalid data.
- Dropped support for Julia versions older than 1.10.0
- Added a 'Recipes' page to the documentation
- Add new genetic code:
blepharisma_macronuclear_genetic_code
- Improve documentation of sequence count methods and sequence string literals
- Various performance improvements to counting,
ExactSearchQuery
andispalindromic
- The heuristics for translating sequences with ambiguous symbols is now improved.
Now,
translate
does not rely on heuristics but uses an algorithm that always returns exactly the right amino acid in the face of ambiguous nucleotides.
- Attempting to translate a nucleotide sequence with gap symbols now throws an error (#278, see #277)
- Migrate from SnoopPrecompile to PrecompileTools (#273)
- Improve error when mis-encoding
LongDNA
from byte-like inputs (#267) - Remove references to internal
Random.GLOBAL_RNG
(#265)
- Fix bug in converting
LongSubSeq
toLongSequence
(#261)
- Add
iterate
method forAlphabets
(#233) - Add SnoopPrecompile workload and dependency on SnoopPrecompile (#257)
- Add
rand!([::AbstractRNG], ::LongSequence, [::Sampler])
methods
- It is now possible to
join
BioSymbols into a BioSequence. - Add
findall
methods toBioSequence
Release has been yanked from General Registry
- Removed
unsafe_setindex!
. Instead, use normal setindex with@inbounds
. - Removed minhashing functionality - see package MinHash.jl
- Removed composition functionality - see package Kmers.jl
- Removed ReferenceSequence functionality
- Removed demultiplexer functionality
- Removed kmer functionality - this is moved to Kmers.jl
- Removed VoidAlphabet and CharAlphabet
- Removed ConditionIterator
- Added type
LongSubSeq
, a view into aLongSequence
. - Added method
translate!(::LongAminoAcidSeq, ::LongNucleotideSeq; kwargs...)
- Added method
join(::Type{T<:BioSeuence}, it)
to join an iterable of biosequences to a new instance of T. - Added method
join!(s::BioSequence, it)
, an in-place version ofjoin
LongSequence
is no longer copy-on-write. For views, useLongSubSeq
.- Renamed
LongAminoAcidSeq
->LongAA
,LongNucleotideSeq
->LongNuc
LongRNASeq
->LongRNA
andLongDNASeq
->LongDNA
- The interface for
Alphabet
andBioSequence
is now more clearly defined, documented, and tested. - The constructor
LongSequence{A}(::Integer)
has been removed in favor ofLongSequence{A}(undef, ::Integer)
. - Biological sequences can no longer be converted to/from strings and vectors.
- Updated the element and substring search API to conform to
Base.find*
patterns.
- Fixed syntax errors where functions were marked with
@inbounds
instead of@inline
.
- New subtypes of Random.Sampler, SamplerUniform and SamplerWeighted.
- Random
LongSequence
s can now be created withrandseq
, optionally using a sampler to specify element distribution. - All random
LongSequence
generator methods take an optional AbstractRNG argument. - Add methods to
randseq
to optimize random generation ofNucleicAcid
orAminoAcid
LongSequence
s. - BioGenerics is now a dependency - replaces BioCore.
- A
SkipmerFactory
iterator that allows iteration over the Skipmers in a nucleotide sequence. A Skipmer is aMer
(see changed below), that is generated using a certain cyclic nucleotide sampling pattern. See this paper for more details. - A
BigMer
parametric primitive type has been added, that has the same functionality asMer
(see changed section), but uses 128 bits instead of 64. - An abstract parametric type called
AbstractMer
has been added to unifyMer
andBigMer
. - Generators of bit-parallel iteration code have been introduced to help developers write bitparallel implementations of some methods. Counting GC content, matches and mismatches have been migrated to use these generators.
- Added
occursin
methods for exact matching.
- The abstract
Sequence
type is now calledBioSequence{A}
. - The type previously called
BioSequence{A}
is nowLongSequence{A}
. Kmers
are now a parametric primitive type:Mer{A<:NucleicAcidAlphabet{2},K}
.unsafe_setindex!
has been made systematic for allsetindex
methods as a way of bypassing all bound checking andorphan!
calls.- Kmer string literals have been updated, they are now
mer""
string literals, and they have a flag to enforce the type ofMer
e.g.:mer"ATCG"dna
,mer"AUCG"rna
- No longer use an old version of Twiddle and deprecated functions.
- Using
Base.count
with certain functions and sequence combinations dispatches to highly optimized bit-parallel implementations, falling back to a default naive counting loop by default for all other predicate-sequence combinations. - No more implicit conversion from strings to biological sequences. The
Base.convert
methods have been renamed toBase.parse
methods.
- The FASTQ module.
- The FASTA module.
- The TwoBit module.
- The ABIF module.
- BioCore is no longer a dependency.
- Automa is no longer a dependency.
- Automatic conversion of
LongDNASeq
toLongRNASeq
when translating sequences. - Add
alternative_start
keyword argument to translate(). - Add abstract type for kmer iterators.
- 🐎 Faster kmer iteration.
- Fixed indexing in ABIF records.
1.0.0 - 2018-08-23
- Issue and PR templates.
- Code of Conduct and Contributing files.
- A changelog file.
- Support for julia v0.7 and v1.0.
- ❗ Support for julia v0.6.
0.8.3 - 2018-02-28
- Fix the
sequence
method so as the sequence type can be specified, allowing type-stable efficient code generation.
0.8.2 - 2018-02-19
- A bug fix for
FASTA.Record
writing where the width parameter of aFASTA.Writer
is less than or equal to zero.
0.8.1 - 2017-11-10
- Update documentation generation.
- Fixes to type definition keywords.
- Bit-parallel GC counting.
0.8.0 - 2017-08-16
- Position weight matrix search functionality.
- A generalised composition method.
typemin
andtypemax
methods forKmer
types.
MinHash
function now generalised toReader
types.- Updates to doc tests.
0.7.0 - 2017-07-28
- Support for julia v0.6 only.
- ❗ Dropped support for julia v0.5.
0.6.3 - 2017-07-06
- Iterators.jl is not longer used as a dependency in favour of Itertools.jl.
0.6.1 - 2017-06-20
- Bug-fix for site-counting algorithm.
0.6.0 - 2017-06-14
- ⬆️ Compatibility with julia v0.6.
- The
ungap
andungap!
methods, that are shorthand for filtering gaps from biological sequences.
- Bug fixes for Kmer iteration that were caused by gaps in 4-bit encoded sequences.
0.5.0 - 2017-06-07
- All files pertaining to the old Bio.Seq module.