Failure with mmpdb fragment for some specific smiles #30

chengthefang · 2021-04-04T20:53:49Z

Hi all,

I am using mmpdb fragment to parse a subset of SureChembl database, and then I found the mmpdb fragment will fail for some specific smiles. I wonder if we could add some error handling to deal with some unfavorable structures.

Here is the example of test.smi.

C[C@]12CCC3c4c5cc(O)cc4[C@@]4(CC[C@@]1(C4)C3CC5)[C@@H]2O SCHEMBL9251776
Oc1ccccc1 phenol
Oc1ccccc1O catechol
Oc1ccccc1N 2-aminophenol
Oc1ccccc1Cl 2-chlorophenol
Nc1ccccc1N o-phenylenediamine
Nc1cc(O)ccc1N amidol
Oc1cc(O)ccc1O hydroxyquinol
Nc1ccccc1 phenylamine
C1CCCC1N cyclopentanol

I ran "python mmpdb/mmpdb fragment test.smi -o test_data.fragments". It failed on parsing the first smiles and won't skip it to continue. The error is shown as below:

Failure: file 'test.smi', line 1, record #1: first line starts 'C[C@]12CCC3c4c5cc(O)cc4[C@@]4(CC[C@@]1(C ...'
Traceback (most recent call last): File "mmpdb/mmpdb", line 11, in commandline.main() File "/mmpdb/mmpdblib/commandline.py", line 1054, in main parsed_args.command(parsed_args.subparser, parsed_args) File "/mmpdb/mmpdblib/commandline.py", line 181, in fragment_command do_fragment.fragment_command(parser, args) File "/mmpdb/mmpdblib/do_fragment.py", line 581, in fragment_command writer.write_records(records) File "/mmpdb/mmpdblib/fragment_io.py", line 404, in write_records for rec in fragment_records: File "/mmpdb/mmpdblib/do_fragment.py", line 475, in make_fragment_records fragments = result.get() File "anaconda2/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value ValueError: need more than 1 value to unpack

Appreciate any suggestions or ideas.

Thanks,
Cheng

The text was updated successfully, but these errors were encountered:

KramerChristian · 2021-04-07T16:31:42Z

Hi Cheng,

thanks for pointing out this issue.

mmpdb does have functionality to skip erroneous SMILES, but this one seems to be another problem - the SMILES is complicated, but chemically correct. The most likely explanation I have so far is that there is an issue with the ring perception for bonds in RDKit. I will do some further tests to make sure I am on the right track, and if I am right, file a bug report in RDKit to solve the issue.

Will keep you posted as this continues.

Bests,
Christian

chengthefang · 2021-04-08T16:09:15Z

Hi Christian,

Thank you so much for looking into this issue. I agree that it might have something to do with the complicated ring system.

Thanks,
Cheng

PARODBE · 2022-11-15T14:31:07Z

Hi Christian,

I can't convert my .smi to fragment for a UTF-8 problem, but i don't understand this because I specify in the code the encoding:

And the error:

Could you help me please???

KramerChristian · 2022-11-18T16:02:49Z

Hi Pablo,

I currently do not personally develop mmpdb any more. This is in the hands of @adalke and Jerome Hert. Maybe they can comment?

Bests,
Christian

adalke · 2022-11-18T16:19:46Z

For @chengthefang , I cannot reproduce the problem using mmpdb3, available from https://github.com/adalke/mmpdb . Perhaps some of the changes I did for version 3 resolves your issue?

For @PARODBE , your comment is not connected to this issue. Please use a new issue instead.

It doesn't appear your problem is connected to mmpdb. It appears to be a general RDKit question. At the very least, you don't describe how "cdk2.fragdb" is generated, or the step you did which generates that error message.

My guess is you're showing me how you exported the SDF to SMILES format, which you then converted to a "fragdb" using mmpdb v3.

Version 2 used a text format to store the fragmentations, version 3 switched to sqlite3. You cannot use text processing to read an SQLite3 file as it's a binary format which includes non-UTF8 byte sequences.

PARODBE · 2022-11-21T07:58:26Z

thanks @adalke ! So...In what format were the saved smiles provided?

adalke · 2022-11-21T09:30:08Z

It's an SQLite3 file. This is the format specified by the SQLite embedded relational database, and accessible from Python via the sqlite3 module.

The specific schema is at https://github.com/adalke/mmpdb/blob/v3-dev/mmpdblib/fragment_schema.sql .

Your question is not related to issue #30 so please do not continue asking questions in this thread. Also, I am not willing to provide additional support on how use SQL or SQLite. There are many existing teaching resources for those topics.

KramerChristian mentioned this issue Apr 9, 2021

bond.IsInRing() does not correctly identify ring bond rdkit/rdkit#4016

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure with mmpdb fragment for some specific smiles #30

Failure with mmpdb fragment for some specific smiles #30

chengthefang commented Apr 4, 2021 •

edited

Loading

KramerChristian commented Apr 7, 2021

chengthefang commented Apr 8, 2021

PARODBE commented Nov 15, 2022

KramerChristian commented Nov 18, 2022

adalke commented Nov 18, 2022

PARODBE commented Nov 21, 2022

adalke commented Nov 21, 2022

Failure with mmpdb fragment for some specific smiles #30

Failure with mmpdb fragment for some specific smiles #30

Comments

chengthefang commented Apr 4, 2021 • edited Loading

KramerChristian commented Apr 7, 2021

chengthefang commented Apr 8, 2021

PARODBE commented Nov 15, 2022

KramerChristian commented Nov 18, 2022

adalke commented Nov 18, 2022

PARODBE commented Nov 21, 2022

adalke commented Nov 21, 2022

chengthefang commented Apr 4, 2021 •

edited

Loading