You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had a look at the tool on a small SMILES set: I generated the IUPAC names with translate_forward, re-converted these to SMILES (using OPSIN), and checked if the structures I got after the conversions match the input structures. In the attached file I've collected a few examples where the structures don't match - I thought these might be useful in the further development of the tool.
Regarding translate_reverse, the SMILES generation yields strange results for simple molecules, e.g.:
In [2]: translate_reverse('propane')
Out[2]: 'CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC.CCC.CCCCCCCCCC.C.CCCCCC.CCCCCCCCCCCCCCCCCCCCCCCCCCC.CCCC.CCC.CCCC.CCCCCCCCCCCCCCCC.C.C.CC.C.C.CCC.CCCC.CC.CCC.CC.CCCC.CC.CCC.C.C.C.C.C.C.C.CC.C.CC.C.C.CC.C.CC.CCC.CCC.CCCCCCCC.CCCC.CC.C.C.CCC.C.C.CCCCCCC.CC.C.C.C.C.C.C.C.CCCC.C.CCC.C.CCCC.C.C.CCC.C.CC.C.C.CC.C.CCC.C.C.CCC.CCC.CC.CCC.CCC.CCCC.C.C.CC.C.C.CCC.C.C.CC.C.CCC.C'
In [3]: translate_reverse('methane')
Out[3]: 'C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.'
The text was updated successfully, but these errors were encountered:
Thank you for bringing this issue to my attention. The problem with the reverse translation is caused by data imbalance in our training dataset. To resolve this, we need to introduce more single-word names into our training data.
I will investigate this issue thoroughly and work on implementing a solution. Your feedback is valuable and will help us improve future versions of the software.
Dear @Kohulan,
I had a look at the tool on a small SMILES set: I generated the IUPAC names with
translate_forward
, re-converted these to SMILES (using OPSIN), and checked if the structures I got after the conversions match the input structures. In the attached file I've collected a few examples where the structures don't match - I thought these might be useful in the further development of the tool.stout_incorrect_structures.csv
Regarding
translate_reverse
, the SMILES generation yields strange results for simple molecules, e.g.:The text was updated successfully, but these errors were encountered: