The charset #1

hsiaoyi0504 · 2017-02-28T20:14:09Z

As I proposed in maxhodak/keras-molecules#54. I am interested in why the charset is designed like this. It's not straightforward. From the viewpoint of chemistry, the chlorine "Cl" should not be treated as "C" and "l". Maybe it will be some improvement if we re-design the charset. I used the implementation from keras-molecules, and when I tried to interpolate between 2 chemical structures (CC=C(C(=CC)c1ccc(O)cc1)c1ccc(O)cc1 and CN1C(=O)CCS(=O)(=O)C1c1ccc(Cl)cc1).
). I got something like these invalid structures below, so I guess the charset is the reason for this.
CC(C)(O)CCC1CCC(Cr)So2c1ccc(C)cc1
CCNC(=O)CN(CC1((l)CN1c1ccc(OC)cc1
CN1C(=O)CN(CC1((#)CN1c1ccc(OC)cc1
CN1C(=O)CC(CC**()(=O)C1c1ccc(Cl)cc1
CN1C(=O)CC(NC()(=O)C1**c1ccc(Cl)cc1

duvenaud · 2017-03-01T21:23:38Z

Great suggestion. Yes, SMILES is clearly suboptimal for this reason. The molecular autoencoder would almost certainly work better if we used a modified language that had fewer opportunities to produce invalid strings.

jmhernandezlobato · 2017-03-12T00:12:17Z

Dear Hsiao Yi, you may find relevant the following paper that we have submitted very recently to the arxiv: https://arxiv.org/abs/1703.01925 By using a grammar and building the variational autoencoder on the production rules of that grammar we avoid some of the problems that you mention. Miguel.

…

On Tue, Feb 28, 2017 at 8:14 PM, hsiao yi ***@***.***> wrote: As I proposed in maxhodak/keras-molecules#54 <maxhodak/keras-molecules#54>. I am interested in why the charset is designed like this. It's not straightforward. From the viewpoint of chemistry, the chlorine "Cl" should not be treated as "C" and "l". Maybe it will be some improvement if we re-design the charset. I used the implementation from keras-molecules, and when I tried to interpolate between 2 chemical structures (CC=C(C(=CC)c1ccc(O)cc1)c1ccc(O)cc1 and CN1C(=O)CCS(=O)(=O)C1c1ccc(Cl)cc1). ). I got something like these invalid structures below, so I guess the charset is the reason for this. CC(C)(O)CCC1CCC(*Cr*)So2c1ccc(C)cc1 CCNC(=O)CN(CC*1*(*(l)CN1*c1ccc(OC)cc1 CN*1*C(=O)CN(CC*1*(*(#)CN1*c1ccc(OC)cc1 CN*1*C(=O)CC(CC**()*(=O)C*1 *c1ccc(Cl)cc1 CN*1*C(=O)CC(NC*()*(=O)C*1**c1ccc(Cl)cc1 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABndalP7TtYxcN8-1sFXRDruGMOAp52tks5rhIARgaJpZM4MO2Ge> .

yangxiufengsia · 2017-06-07T15:23:02Z

Hi, I tried to find the code of bayesian optimization used in this paper. But it seems the code not included. Will you plan to share the code of bo?

yangxiufengsia · 2017-06-07T15:27:09Z

I tried use the bayesian optimization to find the better molecules. But when use BO search in the 292 space, I alwasy got invalid smiles same like Hsiao Yi got , so I guess this might be caused by the way to chose inducing point , right?

duvenaud · 2017-06-07T19:25:59Z

You were doing BayesOpt in a 292-dimensional space? We were already having a hard time with a 56D space. One thing you might want to look at are the lengthscales of each dimension - we found that they were often very long, and that the GP was basically just doing linear regression.

jmhernandezlobato · 2017-06-07T20:02:56Z

I will try to upload the code for Bayesian optimization by next week. In our experiments we obtained a large number of invalid smiles. At each point, we decoded a large number of smiles (500) and from those, we only kept the valid ones.

yangxiufengsia · 2017-06-08T02:00:31Z

Thank you very much for answering my questions. Yes, I tried 292 dimensions by using GpyOpt. For the lengthscale of each dimension, I use [-1,1], I guess this lengthscale might not be correct. I look forward to your BO code.

abhik1368 · 2017-08-16T04:23:10Z

Can you suggest why we are using 292 space . What's the logic behind it ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The charset #1

The charset #1

hsiaoyi0504 commented Feb 28, 2017 •

edited

Loading

duvenaud commented Mar 1, 2017

jmhernandezlobato commented Mar 12, 2017 via email

yangxiufengsia commented Jun 7, 2017

yangxiufengsia commented Jun 7, 2017

duvenaud commented Jun 7, 2017

jmhernandezlobato commented Jun 7, 2017

yangxiufengsia commented Jun 8, 2017

abhik1368 commented Aug 16, 2017

The charset #1

The charset #1

Comments

hsiaoyi0504 commented Feb 28, 2017 • edited Loading

duvenaud commented Mar 1, 2017

jmhernandezlobato commented Mar 12, 2017 via email

yangxiufengsia commented Jun 7, 2017

yangxiufengsia commented Jun 7, 2017

duvenaud commented Jun 7, 2017

jmhernandezlobato commented Jun 7, 2017

yangxiufengsia commented Jun 8, 2017

abhik1368 commented Aug 16, 2017

hsiaoyi0504 commented Feb 28, 2017 •

edited

Loading