Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fragdb proposal #39

Merged
merged 14 commits into from
Dec 20, 2023
Merged

[WIP] Fragdb proposal #39

merged 14 commits into from
Dec 20, 2023

Conversation

adalke
Copy link
Contributor

@adalke adalke commented Oct 13, 2021

This is a work-in-progress to replace the JSON-Lines fragment file with SQLite-based file.

For full details see #37 .

  • The fragment filename (if not given) defaults to "input.fragdb"
    • This applies to mmpdb fragment and mmpdb index
  • The generated SQLite file is about the same size as the uncompressed fragments file (though the fragments file is very compressible.)
  • Fragment output time doesn't appear affected (profiling shows at most 1% in sqlite)
  • I haven't tested indexing performance
  • The "fragments" format is still available, for cross-comparison purposes.
    • That should be removed before the pull request is accepted.
  • It removes the (undocumented) fraginfo format.

The current code needs another cleanup pass.

I will first investigate if using SQLAlchemy simplifies the tedious manual ORM of this work-in-progress.

@adalke
Copy link
Contributor Author

adalke commented Oct 15, 2021

Just switched the implementation over to use dataclasses instead of the manual class definitions using __slots__, __init__ and __repr__.

After tuning, the overall performance is the same as the hand-written code.

I also cleaned up the SQL code to make better use of the dataclass information, which helps reduce the amount of typing to convert SQL column names to local variable names to class instance names.

@adalke adalke marked this pull request as ready for review October 26, 2021 15:06
@chem-bio chem-bio merged commit 03b72c4 into rdkit:master Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants