You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Often someone will want to resolve a gene symbol, ie: BRCA2:c.36del
Data providers could take an argument eg resolve_genes_with_canonical_transcripts=True which would look up the MANE select transcript, then substitute it for eg NM_000059.4:c.36del
The same goes with LRG, eg Variant Validator gives "LRG_1:g.8638G>T automapped to equivalent RefSeq record
NG_007400.1:g.8638G>T"
PROBLEM
The way the biocommons HGVS code current works (and our data provider) is to retrieve the transcript (build independent) then get coordinates out later. So switching the transcripts out before knowing what build it is could cause troubles, as eg Eg NUDT4B - has transcript NM_001355407 which is MANE select (v2) but no present at all in 37
Example: OR8G1 has NM_001002905.1 as the "37 RefSeq select" and NM_001002905.2 as "38 MANE Select" (NM_001002905.2 does not exist in 37) - this is pretty rare though (out of 19k MANE transcripts - 20 don't have anything for 37, 22 don't have the version)
This would be seemingly straight forward to do inside the data provider by just switching out transcripts using an aliases (eg LRG or gene names for canonical transcript, eg:
# cdot.hgvs.dataproviders.json_data_provider.JSONDataProvider
self.aliases = {
"LRG_1t1": "NM_000088.3",
}
def _resolve_transcript_alias(self, tx_ac):
""" Returns an alias if we have one, otherwise tx_ac unmodified """
return self.aliases.get(tx_ac, tx_ac)
def _get_transcript(self, tx_ac):
tx_ac = self._resolve_transcript_alias(tx_ac)
return self.transcripts.get(tx_ac)
def get_seq(self, ac, start_i=None, end_i=None):
ac = self._resolve_transcript_alias(ac)
return super().get_seq(ac, start_i=start_i, end_i=end_i)
You could possibly do it this way then just throw an exception the few times it fails (ie returns a MANE transcript that isn't in 37 - note only if you have a data file with both genome builds too) the downside is there is no way to return a message, eg that you substituted out a transcript in there, so this may be slightly dodgy
Often someone will want to resolve a gene symbol, ie: BRCA2:c.36del
Data providers could take an argument eg
resolve_genes_with_canonical_transcripts=True
which would look up the MANE select transcript, then substitute it for eg NM_000059.4:c.36delThe same goes with LRG, eg Variant Validator gives "LRG_1:g.8638G>T automapped to equivalent RefSeq record
NG_007400.1:g.8638G>T"
PROBLEM
The way the biocommons HGVS code current works (and our data provider) is to retrieve the transcript (build independent) then get coordinates out later. So switching the transcripts out before knowing what build it is could cause troubles, as eg Eg NUDT4B - has transcript NM_001355407 which is MANE select (v2) but no present at all in 37
Example: OR8G1 has NM_001002905.1 as the "37 RefSeq select" and NM_001002905.2 as "38 MANE Select" (NM_001002905.2 does not exist in 37) - this is pretty rare though (out of 19k MANE transcripts - 20 don't have anything for 37, 22 don't have the version)
This would be seemingly straight forward to do inside the data provider by just switching out transcripts using an aliases (eg LRG or gene names for canonical transcript, eg:
LRG
Records look like:
Seemingly build independent
The text was updated successfully, but these errors were encountered: