Most BioSequence
concrete subtypes for the most part behave like other vector or string types. They can be indexed using integers or ranges:
For example, with LongSequence
s:
julia> seq = dna"ACGTTTANAGTNNAGTACC"
19nt DNA Sequence:
ACGTTTANAGTNNAGTACC
@@ -15,7 +15,7 @@
julia> seq[5] = DNA_A
DNA_A
-
Some types such can be indexed using integers but not using ranges.
For LongSequence
types, indexing a sequence by range creates a copy of the original sequence, similar to Array
in Julia's Base
library. If you find yourself slowed down by the allocation of these subsequences, consider using a sequence view instead.
In addition to setindex
, many other modifying operations are possible for biological sequences such as push!
, pop!
, and insert!
, which should be familiar to anyone used to editing arrays.
push!(seq::BioSequence, x)
Append a biological symbol x
to a biological sequence seq
.
sourcepop!(seq::BioSequence)
Remove the symbol from the end of a biological sequence seq
and return it. Returns a variable of eltype(seq)
.
sourcepushfirst!(seq, x)
Insert a biological symbol x
at the beginning of a biological sequence seq
.
sourcepopfirst!(seq)
Remove the symbol from the beginning of a biological sequence seq
and return it. Returns a variable of eltype(seq)
.
sourceinsert!(seq::BioSequence, i, x)
Insert a biological symbol x
into a biological sequence seq
, at the given index i
.
sourcedeleteat!(seq::BioSequence, i::Integer)
Delete a biological symbol at a single position i
in a biological sequence seq
.
Modifies the input sequence.
sourceappend!(seq, other)
Add a biological sequence other
onto the end of biological sequence seq
. Modifies and returns seq
.
sourceresize!(seq, size, [force::Bool])
Resize a biological sequence seq
, to a given size
. Does not resize the underlying data array unless the new size does not fit. If force
, always resize underlying data array.
sourceempty!(seq::BioSequence)
Completely empty a biological sequence seq
of nucleotides.
sourceHere are some examples:
julia> seq = dna"ACG"
+
Some types such can be indexed using integers but not using ranges.
For LongSequence
types, indexing a sequence by range creates a copy of the original sequence, similar to Array
in Julia's Base
library. If you find yourself slowed down by the allocation of these subsequences, consider using a sequence view instead.
In addition to setindex
, many other modifying operations are possible for biological sequences such as push!
, pop!
, and insert!
, which should be familiar to anyone used to editing arrays.
push!(seq::BioSequence, x)
Append a biological symbol x
to a biological sequence seq
.
sourcepop!(seq::BioSequence)
Remove the symbol from the end of a biological sequence seq
and return it. Returns a variable of eltype(seq)
.
sourcepushfirst!(seq, x)
Insert a biological symbol x
at the beginning of a biological sequence seq
.
sourcepopfirst!(seq)
Remove the symbol from the beginning of a biological sequence seq
and return it. Returns a variable of eltype(seq)
.
sourceinsert!(seq::BioSequence, i, x)
Insert a biological symbol x
into a biological sequence seq
, at the given index i
.
sourcedeleteat!(seq::BioSequence, i::Integer)
Delete a biological symbol at a single position i
in a biological sequence seq
.
Modifies the input sequence.
sourceappend!(seq, other)
Add a biological sequence other
onto the end of biological sequence seq
. Modifies and returns seq
.
sourceresize!(seq, size, [force::Bool])
Resize a biological sequence seq
, to a given size
. Does not resize the underlying data array unless the new size does not fit. If force
, always resize underlying data array.
sourceempty!(seq::BioSequence)
Completely empty a biological sequence seq
of nucleotides.
sourceHere are some examples:
julia> seq = dna"ACG"
3nt DNA Sequence:
ACG
@@ -34,7 +34,7 @@
julia> deleteat!(seq, 2:3)
3nt DNA Sequence:
AAT
-
In addition to these basic modifying functions, other sequence transformations that are common in bioinformatics are also provided.
reverse!(seq::LongSequence)
Reverse a biological sequence seq
in place.
sourcereverse(seq::BioSequence)
Create reversed copy of a biological sequence.
sourcereverse(seq::LongSequence)
Create reversed copy of a biological sequence.
sourcecomplement!(seq)
Make a complement sequence of seq
in place.
sourcecomplement(nt::NucleicAcid)
Return the complementary nucleotide of nt
.
This function returns the union of all possible complementary nucleotides.
Examples
julia> complement(DNA_A)
+
In addition to these basic modifying functions, other sequence transformations that are common in bioinformatics are also provided.
reverse!(seq::LongSequence)
Reverse a biological sequence seq
in place.
sourcereverse(seq::BioSequence)
Create reversed copy of a biological sequence.
sourcereverse(seq::LongSequence)
Create reversed copy of a biological sequence.
sourcecomplement!(seq)
Make a complement sequence of seq
in place.
sourcecomplement(nt::NucleicAcid)
Return the complementary nucleotide of nt
.
This function returns the union of all possible complementary nucleotides.
Examples
julia> complement(DNA_A)
DNA_T
julia> complement(DNA_N)
@@ -42,10 +42,10 @@
julia> complement(RNA_U)
RNA_A
-
sourcecomplement(seq)
Make a complement sequence of seq
.
sourcereverse_complement!(seq)
Make a reversed complement sequence of seq
in place.
sourcereverse_complement(seq)
Make a reversed complement sequence of seq
.
sourceRemove gap characters from an input sequence.
sourceCreate a copy of a sequence with gap characters removed.
sourcecanonical!(seq::NucleotideSeq)
Transforms the seq
into its canonical form, if it is not already canonical. Modifies the input sequence inplace.
For any sequence, there is a reverse complement, which is the same sequence, but on the complimentary strand of DNA:
------->
+
sourcecomplement(seq)
Make a complement sequence of seq
.
sourcereverse_complement!(seq)
Make a reversed complement sequence of seq
in place.
sourcereverse_complement(seq)
Make a reversed complement sequence of seq
.
sourceRemove gap characters from an input sequence.
sourceCreate a copy of a sequence with gap characters removed.
sourcecanonical!(seq::NucleotideSeq)
Transforms the seq
into its canonical form, if it is not already canonical. Modifies the input sequence inplace.
For any sequence, there is a reverse complement, which is the same sequence, but on the complimentary strand of DNA:
------->
ATCGATCG
CGATCGAT
-<-------
Of the two sequences, the canonical of the two sequences is the lesser of the two i.e. canonical_seq < other_seq
.
Using this function on a seq
will ensure it is the canonical version.
sourcecanonical(seq::NucleotideSeq)
Create the canonical sequence of seq
.
sourceSome examples:
julia> seq = dna"ACGTAT"
+<-------
Of the two sequences, the canonical of the two sequences is the lesser of the two i.e. canonical_seq < other_seq
.
Using this function on a seq
will ensure it is the canonical version.
sourcecanonical(seq::NucleotideSeq)
Create the canonical sequence of seq
.
sourceSome examples:
julia> seq = dna"ACGTAT"
6nt DNA Sequence:
ACGTAT
@@ -60,7 +60,7 @@
julia> reverse_complement!(seq)
6nt DNA Sequence:
ACGTAT
-
Many of these methods also have a version which makes a copy of the input sequence, so you get a modified copy, and don't alter the original sequence. Such methods are named the same, but without the exclamation mark. E.g. reverse
instead of reverse!
, and ungap
instead of ungap!
.
Translation is a slightly more complex transformation for RNA Sequences and so we describe it here in more detail.
The translate
function translates a sequence of codons in a RNA sequence to a amino acid sequence based on a genetic code. The BioSequences
package provides all NCBI defined genetic codes and they are registered in ncbi_trans_table
.
translate(seq, code=standard_genetic_code, allow_ambiguous_codons=true, alternative_start=false)
Translate an LongRNA
or a LongDNA
to an LongAA
.
Translation uses genetic code code
to map codons to amino acids. See ncbi_trans_table
for available genetic codes. If codons in the given sequence cannot determine a unique amino acid, they will be translated to AA_X
if allow_ambiguous_codons
is true
and otherwise result in an error. For organisms that utilize alternative start codons, one can set alternative_start=true
, in which case the first codon will always be converted to a methionine.
sourceGenetic code list of NCBI.
The standard genetic code is ncbi_trans_table[1]
and others can be shown by show(ncbi_trans_table)
. For more details, consult the next link: http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/index.cgi?chapter=cgencodes.
sourcejulia> ncbi_trans_table
+
Many of these methods also have a version which makes a copy of the input sequence, so you get a modified copy, and don't alter the original sequence. Such methods are named the same, but without the exclamation mark. E.g. reverse
instead of reverse!
, and ungap
instead of ungap!
.
Translation is a slightly more complex transformation for RNA Sequences and so we describe it here in more detail.
The translate
function translates a sequence of codons in a RNA sequence to a amino acid sequence based on a genetic code. The BioSequences
package provides all NCBI defined genetic codes and they are registered in ncbi_trans_table
.
translate(seq, code=standard_genetic_code, allow_ambiguous_codons=true, alternative_start=false)
Translate an LongRNA
or a LongDNA
to an LongAA
.
Translation uses genetic code code
to map codons to amino acids. See ncbi_trans_table
for available genetic codes. If codons in the given sequence cannot determine a unique amino acid, they will be translated to AA_X
if allow_ambiguous_codons
is true
and otherwise result in an error. For organisms that utilize alternative start codons, one can set alternative_start=true
, in which case the first codon will always be converted to a methionine.
sourceGenetic code list of NCBI.
The standard genetic code is ncbi_trans_table[1]
and others can be shown by show(ncbi_trans_table)
. For more details, consult the next link: http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/index.cgi?chapter=cgencodes.
sourcejulia> ncbi_trans_table
Translation Tables:
1. The Standard Code (standard_genetic_code)
2. The Vertebrate Mitochondrial Code (vertebrate_mitochondrial_genetic_code)
@@ -80,4 +80,4 @@
23. Thraustochytrium Mitochondrial Code (thraustochytrium_mitochondrial_genetic_code)
24. Pterobranchia Mitochondrial Code (pterobrachia_mitochondrial_genetic_code)
25. Candidate Division SR1 and Gracilibacteria Code (candidate_division_sr1_genetic_code)
-
https://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/index.cgi?chapter=cgencodes