-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stats to codon tables #350
Add stats to codon tables #350
Conversation
This is an interesting example of I think how we could improve the readability of this package. I'd like to know your take @cachemoi So the problem here is that there are actually two things we're talking about: a codon table, and a codon usage table. A codon table is just the mapping of codons to amino acids ( What you've done with Hence:
So actually, all the real magic for the use of this package can be found in the tests func ExampleOptimize() {
gfpTranslation := "MASKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK*"
sequence, _ := genbank.Read("../../data/puc19.gbk")
codonTable := codon.GetCodonTable(11)
// a string builder to build a single concatenated string of all coding regions
var codingRegionsBuilder strings.Builder
// initiate genes
genes := 0
// iterate through the features of the genbank file and if the feature is a coding region, append the sequence to the string builder
for _, feature := range sequence.Features {
if feature.Type == "CDS" {
sequence, _ := feature.GetSequence()
// Note: sometimes, genbank files will have annotated CDSs that are pseudo genes (not having triplet codons).
// This will shift the entire codon table, messing up the end results. To fix this, make sure to do a modulo
// check.
if len(sequence)%3 == 0 {
codingRegionsBuilder.WriteString(sequence)
// Another good double check is to count genes, then count stop codons.
genes++
}
}
}
// get the concatenated sequence string of the coding regions
codingRegions := codingRegionsBuilder.String()
// weight our codon optimization table using the regions we collected from the genbank file above
optimizationTable := codonTable.OptimizeTable(codingRegions)
// Here, we double check if the number of genes is equal to the number of stop codons
stopCodonCount := 0
for _, aa := range optimizationTable.GetAminoAcids() {
if aa.Letter == "*" {
for _, codon := range aa.Codons {
stopCodonCount = stopCodonCount + codon.Weight
}
}
}
if stopCodonCount != genes {
fmt.Println("Stop codons don't equal number of genes!")
}
optimizedSequence, _ := codon.Optimize(gfpTranslation, optimizationTable)
optimizedSequenceTranslation, _ := codon.Translate(optimizedSequence, optimizationTable)
fmt.Println(optimizedSequenceTranslation == gfpTranslation)
// output: true
} Ideally, within the loop finding feature sequences we would add in the |
synthesis/codon/codon.go
Outdated
for codon, count := range toAdd.StartCodonCount { | ||
s.StartCodonCount[codon] = s.StartCodonCount[codon] + count | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add
should getting added to as we add example protein sequences to the codon table describing the usage of different codons. This would not be a feature of the codon table itself, but something that gets modified over time.
synthesis/codon/codon.go
Outdated
IsEmpty() bool | ||
OptimizeTable(string) Table | ||
} | ||
|
||
// Stats denotes a set of computed codon table statistics | ||
type Stats struct { | ||
StartCodonCount map[string]int `json:"start_codons_counts"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with, instead of having a Stats struct that is added to the codon table, we have ComputeStats()
. I think that is much cleaner. Although we'd still need to have StartCodonCount
, since that can't be contained in the current struct.
synthesis/codon/codon.go
Outdated
@@ -338,6 +364,7 @@ func generateCodonTable(aminoAcids, starts string) codonTable { | |||
triplet := string([]byte{base1[i], base2[i], base3[i]}) | |||
aminoAcidMap[aminoAcid] = append(aminoAcidMap[aminoAcid], Codon{triplet, 1}) | |||
if starts[i] == 77 { // M rune | |||
stats.StartCodonCount[triplet]++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't implement a check for the number of codons in a genome, just a count (which will be here, by definition, 1) of the codons that can be start codons, not the weighted probability that any given one will be a start codon, which is what we are looking for.
Thanks for the thorough explanation @Koeng101 ! Sorry about the misunderstanding my biology is (very) rusty 😅 So if I understand correctly the meat of this package is to change a coding sequence so that we use some codons more frequently than other for e.g making translation of a gene present in one specie easier in another. Looking through the code my instinct is that the func Translate(sequence string, codonTable Table) (string, error)
func Optimize(aminoAcids string, codonTable Table, randomState ...int) (string, error) the // Table is an interface that specifies the functions that all table types must implement
type Table interface {
Chooser() (map[string]weightedRand.Chooser, error)
GenerateTranslationTable() map[string]string
GenerateStartCodonTable() map[string]string
GetAminoAcids() []AminoAcid
GetStartCodons() []string
GetStopCodons() []string
IsEmpty() bool
OptimizeTable(string) Table
} Apart from the data accessor funcs the functions with extra logic in them are My understanding of I think it's nice to have structs which are named like What do you think about separating the key functions in small structs? For example: type Optimizer struct {
table codonTable
chooser weightedRand.Chooser
}
// (previously codonTable.OptimizeTable)
func NewOptimizerFromSequence(sequence string) *Optimizer {
// code
}
func NewOptimizerFromTable(table codonTable) *Optimizer {
// code
}
func (o *Optimizer) Optimize(sequence string) string {
// code
return optimizedSequence
}
// (Translator could probably live in another package)
type Translator struct {
codonTable Table
}
func (t *Translator) Translate(sequence string) string {
// code
return translatedSequence
} This would (to me) make the purpose of the package a bit clearer and reduce the number of exposed funcs, making things a bit simpler. I've pushed an example of what that would look like for the stats computation, the Let me know if you think that would make the package clearer (not suggesting I'd do this in this PR, just some thoughts). |
synthesis/codon/analyzer.go
Outdated
return nil, err | ||
} | ||
|
||
stats.StartCodonCount[sequence[:3]]++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a bit of a naive way to find the start codon (just picking the first 3 letters of a cds), is there a better way to find the start codon @Koeng101 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an instance where magic numbers are ok - unless we start dealing with xenobiotic stuff (O_o), which IMO is outside the scope of this package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was asking more because I'm not 100% sure that a CDS will always begin with the start codon. In typical biology fashion you can get things like upstream start codons (or many other special cases I'm sure). No need to overcomplicate if not necessary, I was just wondering if this was too naive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lab I worked in actually did a study on this. Turns out, biology is really... wack!
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5397182/
In E.coli, out of 64 codons, 47 give detectable levels of start-codon expression. Sometimes with f-methionine, sometimes they are just initiated with a different amino acid. Very wobbly, but generally the expression was kind of crap compared to canonical start codons.
That being said, a start codon is definitionally the first codon of the CDS, so if there was an alternative upstream codon - guess it shoulda been annotated! In this case, we can safely say 3 is always safe and an alright use of a magic number (since it isn't even really a magic number if you know about codons).
I know the Separating functions across all these interfaces makes usage more complicated as clients have to instantiate and keep track of an additional object. If all Reviewing now and leaving more detailed thoughts. |
synthesis/codon/analyzer.go
Outdated
// Stats denotes a set of computed codon table statistics | ||
type Stats struct { | ||
StartCodonCount map[string]int `json:"start_codons_counts"` | ||
} | ||
|
||
func NewStats() *Stats { | ||
return &Stats{ | ||
StartCodonCount: map[string]int{}, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why define a new struct which has a single member and no methods?
This should either be a type alias (type Stats map[string] int
) or not be a type at all. IMO the latter is preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use type aliases more to enhance readabilty/make it harder to mess up (e.g type DNA string
), this is meant as a container struct because we presumably will add more stats in the future. Rather than having a large list of ints
and maps
in a struct that has nothing to do with codon stats, it's nicer to have it in its own type.
Where would you put it if you'd like to keep it as an untyped map? codonTable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there's a case where the stats will be used together often, I think they should be returned by different methods or functions.
IMO it should be something like the following would be best:
func ComputeStartCodonStatistics(codonTable Table) map[string]int {
// code
return statisticsMap
}
func ComputeAnotherStatistic(codonTable Table) SomeType {
// code
return instanceOfSomeType
}
synthesis/codon/analyzer.go
Outdated
// Analyzer is used to compute codon usage statistics | ||
type Analyzer struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty struct with a single method could just be a function. Structs are for storing state/bundling data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not always, here's one in kubernetes. You use structs like this to make it easier to mock in unit tests, for example we might want to test something which relies on the analyser without actually calling the struct's code.
e.g we can imagine elsewhere:
type Analyser interface{
ComputeStats(sequence genbank.Genbank) (*Stats, error)
}
// test.go
type mockAnalyzer struct{}
func (a *Analyzer) ComputeStats(sequence genbank.Genbank) (*Stats, error) {
return //stats, mockError
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense in the Kubernetes example because these structs implement a common interface. I don't see a use case for defining an interface here.
Getting the statistics from a codon table should be a pure function we can extensively unit test, what scenarios do you have in mind where we would need to mock it?
synthesis/codon/analyzer.go
Outdated
type Analyzer struct{} | ||
|
||
// ComputeStats returns a set of codon related statistics from a genebank sequence | ||
func (a *Analyzer) ComputeStats(sequence genbank.Genbank) (*Stats, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to have this take a genbank entry, or just a list of arbitrary sequences? @Koeng101 idk what your intentions were for this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly just wanna be able to generate an optimization table reliably from a genbank file. Stats are kinda optional, mostly because I can just stare down into the struct if I really need them (also, I don't actually use any of the stats I do look at, it's mostly just curiosity for me. Like how Mesoplasma florum has extremely few CGG
codons...)
synthesis/codon/analyzer.go
Outdated
|
||
// iterate through the features of the genbank file and if the feature is a coding region, get the start codon | ||
for _, feature := range sequence.Features { | ||
if feature.Type == "CDS" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Koeng101 hmm, maybe we should store common feature types in the genbank package as an enum (type FeatureType string
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea! Though first we have to merge the genbank package... maybe add as an issue?
synthesis/codon/analyzer.go
Outdated
// iterate through the features of the genbank file and if the feature is a coding region, get the start codon | ||
for _, feature := range sequence.Features { | ||
if feature.Type == "CDS" { | ||
sequence, err := feature.GetSequence() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, why does this method of Feature
have an error in its return type? Reading through the implementation, the error will always be nil
. @Koeng101 please confirm whether this is intended behavior, if not I will open a new issue for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the intended behavior, though ironically I think most of the time its just panicked for me and I had to go fix the panic. The reason is that genbank location strings can be stupid, and impossible to parse.
synthesis/codon/analyzer.go
Outdated
return nil, err | ||
} | ||
|
||
stats.StartCodonCount[sequence[:3]]++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an instance where magic numbers are ok - unless we start dealing with xenobiotic stuff (O_o), which IMO is outside the scope of this package.
synthesis/codon/analyzer_test.go
Outdated
"github.com/google/go-cmp/cmp/cmpopts" | ||
) | ||
|
||
func TestComputeStats(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit tests should not rely on file I/O unless absolutely necessary. These are already table-driven tests, the tests
struct should be of the following form:
struct {
name string
input genbank.Genbank
want *Stats
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was basing it off the existing codon test which read the genebank files, I assume that's because the genebank input you'd need to include in your table test would be rather large? I can change it to the object if we're ok with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Koeng101 I guess this just brings up my question again as to whether we want to handle genbank objects or just string sequences.
I totally forgot that the genbank.Genbank
is a massive struct - I'm ok with reading it from file, you're right that it would be annoying to type out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
genbank.Genbank
structs are silly big, so I think we should just read from a file for tests, though for examples I think we should always have it as a string (so it can run on the docs website well).
The reason the tests here are off of genbank files is because that is where pretty much all the codon tables will come from until we implement #247 , which also requires quite a bit of engineering effort, I think. If we do it off of just sequences, for example, you can miss a lot of the small things that matter when parsing a real codon table from a real file.
To put #247 in perspective, the codon usage database, the most widely used database for codon usage, was last updated in 2007. So there is lots of room for us to improve over the state-of-the-art.
synthesis/codon/analyzer_test.go
Outdated
analyzer := &Analyzer{} | ||
|
||
stats, err := analyzer.ComputeStats(sequence) | ||
if err != nil { | ||
t.Fatal(err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this shows why the Analyzer
struct is unnecessary. You're instantiating an empty struct and immediately calling one of its methods, then never use the struct again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test is meant to only test ComputeStats
, so which other method would I be calling? The point of the single method struct is to make mocking/interfacing easier for consumers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test is meant to only test ComputeStats, so which other method would I be calling?
Oop yeah, nevermind on this.
The point of the single method struct is to make mocking/interfacing easier for consumers.
I still don't see the the use case for this. I'll leave a more detailed response in the main PR thread.
synthesis/codon/codon_test.go
Outdated
if !cmp.Equal(got, tt.wantCodonTable, sortCodons, sortAminoAcids, sortStrings) { | ||
t.Errorf("got and tt.wantCodonTable didn't match %s", cmp.Diff(got, tt.wantCodonTable)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the same if diff := cmp.Diff(); diff != "" {}
syntax I used elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a particular reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a style choice than anything, though I guess it reduces the number of calls to the cmp
package. Should have marked this as a nit!
If the object instantiation is kept clear and we make it impossible (or at least hard) to end up with an invalid object that's not really an issue (which is why I'm suggesting the I'm not overly fond of single method interfaces either (and my suggested names stutter), I was just struggling to come up with a good name for a struct which does both Translation and Optimisation 😅 Just to reiterate the point of those structs is to make interfacing/mocking/testing easier for consumers, if you keep them as top level package functions they become harder to swap out |
It's an issue for readability, as it adds a new interface + associated struct to our API where we could just have a single function instead.
Translation is going to be something we do with pretty much every codon table, so I think it belongs in the
How so? Take this example:
If the client wants to mock If we could reasonably expect |
Correct. It is also used in our (gene fixer)[https://github.com/TimothyStiles/poly/blob/main/synthesis/fix/synthesis.go] for fixing up coding sequences so they can be synthesized more easily. So it's basically
I think the table interface could definitely be simplifed. I'm not tied to it, and would encourage any simplification.
I like it, I'm sold.
Honestly, I think the stats are useful, but separately from the start codon issue. For start codons, it's pretty simple: Just gimme the occurence count of the first 3 base pairs for every protein used. When I think of codon statistics, I'm thinking of codon distribution and rare codon counts, stuff like that. Though beyond all that, I think simplification of this package is more important than stats or start codons - I think you've laid out a way it could be made easier and more readable.
Translation tables are very easy to get (we actually have literally all the translation tables in a map), optimization tables not so much. It might be nice to more easily say "this is a translation table" and "this is a optimization table", since the first we can make very easily accessible, and the second can have methods for generating.
Here's some more context: You must know a translation table in order to generate an optimization table. Then an optimization table is just a translation table extended with counts for each codon occurence. I stuck em both in the same struct cause I sucked more at Golang back then. Nowadays I'd probably just have a translation table and then a separate Then, we kinda just added functionality on top of that foundation. But fundamentally:
Then, we have a couple things we can do with those codon tables:
Of course, there are nice things about the struct right now (it generates clearer JSON, which I consider a win, because I mainly want this to replace the codon usage database at some point), so don't take the |
I guess I'll reiterate that simplification and communication of what we want in this package is more important than any stats about codon tables, because the pretty much the only thing stats of codon tables will be used for is my own edification when I browse through the genbank databank converted to codon tables... |
// apply weights to codonTable | ||
for codonIndex, codon := range aminoAcid.Codons { | ||
table.AminoAcids[aminoAcidIndex].Codons[codonIndex].Weight = codonFrequencyMap[codon.Triplet] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken this function updated the (global) default NCBI tables, it seems that this may have been intended but I assumed not so changed getCodonTable()
to return a copy of the table rather than the hardcoded one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The global NCBI tables was intended so you only had to instantiate it once, but might be better to have it under a function. The NCBI tables really don't mutate quickly at all, so it'd be nicest to have them as constants, but that is kinda hard in Golang I think
synthesis/codon/codon_test.go
Outdated
@@ -1,22 +1,20 @@ | |||
package codon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic of all existing tests is kept the same (and they pass) but I should probably add some more. Let me know if you want a particular one!
I tried to have a go at re-jigging the structures a bit @Koeng101 but this means the PR has grown 😅 . Let me know if it makes sense to you, basically I removed most simple data accessor functions from the This means it's harder to end up with an invalid table, we don't recompute translation maps on every run and the exposed interface is cleaner (most of those functions were onnly used in internal I did not create a separate You could also repeateadly update weights on the same table, hence the Let me know what you think too @carreter, I removed the separate |
I am running some GoldenGate + transformations today, so I'll have to get to this PR a little later - will in next few days! Thank you for being patient with us :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few changes, and then I think it looks good!
I mostly went and just read the code in the file, I think the changes were sufficient that a from-scratch re-read would be nice. I mostly think there should be comment / style updates, the code itself looks good.
synthesis/codon/codon.go
Outdated
// Stats denotes a set of statistics we maintain throughout the translation table's lifetime. For example we track | ||
// the start codons observed when we update the codon table's weights with other DNA sequences | ||
type Stats struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One edit I'd like: Place stats below the Codon / Amino acid / Table interface. I think these should be arranged by priority to understand the code: Stats are rather low here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Let's get @TimothyStiles in here for a final review n merge!
Sorry, one more change: Could you add the feature to the changelog? Then I think ready for merge. |
Co-authored-by: Willow Carretero Chavez <[email protected]>
2c35512
to
58a511c
Compare
I had to rebase to get the changelog on my branch, done in 58a511c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is good. Test coverage has dropped from the previous 100% to 95.3% and I've commented on the lines that are not covered.
github wouldn't let me add this inline but line 359 in codon.go
is also no longer covered.
return nil, fmt.Errorf("weightedRand.NewChooser() error: %s", err)
I do like users just being able to weight a codon table using a genbank but I think we should also have an explicit example for how to weight codon tables manually in the case that the user isn't fortunate (or misfortunate) enough to be using genbank. The codon optimizer was actually shipped to work this way originally but we changed it when someone mentioned they weren't able to use it with fasta sequences.
} | ||
|
||
// codonTable holds information for a codon table. | ||
type codonTable struct { | ||
func NewStats() *Stats { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a doc string // NewStats does ...
// This function is run at buildtime and failure here means we have an invalid codon table. | ||
chooser, err := newAminoAcidChoosers(aminoAcidSlice) | ||
if err != nil { | ||
panic(fmt.Errorf("tried to generate an invalid codon table %w", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this should panic and is also not covered by tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generateCodonTable
runs at build time when the hardcoded tables are initialised, having an invalid table is not an error the caller can/will ever handle hence the panic. Should I refactor the initialisation code to handle the error? What would I do with the error apart from panicing?
if feature.Type == "CDS" { | ||
sequence, err := feature.GetSequence() | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not covered by tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will never error (issue),should I ignore the error here or make mocks?
} | ||
|
||
if len(codingRegions) == 0 { | ||
return errNoCodingRegions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not covered by tests.
func (table *TranslationTable) UpdateWeightsWithSequence(data genbank.Genbank) error { | ||
codingRegions, err := extractCodingRegion(data) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not covered by tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extractCodingRegion
would only error due to a feature.GetSequence()
call, which never errors as per this comment
// This will shift the entire codon table, messing up the end results. To fix this, make sure to do a modulo | ||
// check. | ||
if len(sequence)%3 != 0 { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not covered by tests.
|
||
err := mergedTable.UpdateWeights(finalAminoAcids) | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not covered by tests.
|
||
err := mergedTable.UpdateWeights(finalAminoAcids) | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not covered by tests.
// Update Chooser | ||
updatedChoosers, err := newAminoAcidChoosers(table.AminoAcids) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not covered by tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is good. Test coverage has dropped from the previous 100% to 95.3% and I've commented on the lines that are not covered.
github wouldn't let me add this inline but line 359 in codon.go
is also no longer covered.
return nil, fmt.Errorf("weightedRand.NewChooser() error: %s", err)
I do like users just being able to weight a codon table using a genbank but I think we should also have an explicit example for how to weight codon tables manually in the case that the user isn't fortunate (or misfortunate) enough to be using genbank. The codon optimizer was actually shipped to work this way originally but we changed it when someone mentioned they weren't able to use it with fasta sequences.
This PR takes care of this issue
We want to add a count of the observed start codon occurences in a generated codon table, this might be of interest to check occurences of alternative start codons.
I've added a table test for
generateCodonTable
, checking that we get the expected full object, and I made it so we add the stats together inAddCodonTable
and use the stats from the first table inCompromiseCodonTable
(I think that's the intended behavior but let me know if not!)I realise that the original issue wanted to add the codon stats to the table struct, but after doing this I think it might be better to create a
ComputeTableStats(t Table) Stats
, or aComputeStats()
function to the table interface instead which does not save the count to thecodonTable
struct.This is because otherwise we might forget to update the stats when mutating the table (e.g when calling
OptimizeTable
, which I think requires updating the start codon stats). Let me know what you think.