Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect generated name (examples) #31

Open
bkovats opened this issue Oct 7, 2024 · 1 comment
Open

Incorrect generated name (examples) #31

bkovats opened this issue Oct 7, 2024 · 1 comment
Assignees
Labels
bug Something isn't working WIP work in progress

Comments

@bkovats
Copy link

bkovats commented Oct 7, 2024

Dear @Kohulan,

I had a look at the tool on a small SMILES set: I generated the IUPAC names with translate_forward, re-converted these to SMILES (using OPSIN), and checked if the structures I got after the conversions match the input structures. In the attached file I've collected a few examples where the structures don't match - I thought these might be useful in the further development of the tool.

stout_incorrect_structures.csv

Regarding translate_reverse, the SMILES generation yields strange results for simple molecules, e.g.:

In [2]: translate_reverse('propane')
Out[2]: 'CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCC.CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC.CCC.CCCCCCCCCC.C.CCCCCC.CCCCCCCCCCCCCCCCCCCCCCCCCCC.CCCC.CCC.CCCC.CCCCCCCCCCCCCCCC.C.C.CC.C.C.CCC.CCCC.CC.CCC.CC.CCCC.CC.CCC.C.C.C.C.C.C.C.CC.C.CC.C.C.CC.C.CC.CCC.CCC.CCCCCCCC.CCCC.CC.C.C.CCC.C.C.CCCCCCC.CC.C.C.C.C.C.C.C.CCCC.C.CCC.C.CCCC.C.C.CCC.C.CC.C.C.CC.C.CCC.C.C.CCC.CCC.CC.CCC.CCC.CCCC.C.C.CC.C.C.CCC.C.C.CC.C.CCC.C'

In [3]: translate_reverse('methane')
Out[3]: 'C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.C.'
@Kohulan
Copy link
Owner

Kohulan commented Oct 7, 2024

Hi @bkovats ,

Thank you for bringing this issue to my attention. The problem with the reverse translation is caused by data imbalance in our training dataset. To resolve this, we need to introduce more single-word names into our training data.

I will investigate this issue thoroughly and work on implementing a solution. Your feedback is valuable and will help us improve future versions of the software.

Best regards,
Kohulan

@Kohulan Kohulan added the bug Something isn't working label Oct 7, 2024
@Kohulan Kohulan self-assigned this Oct 7, 2024
@Kohulan Kohulan added the WIP work in progress label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working WIP work in progress
Projects
None yet
Development

No branches or pull requests

2 participants