Skip to content

A CDK-based library for generating Scaffold Trees and Scaffold Networks

License

Notifications You must be signed in to change notification settings

Steinbeck-Lab/ScaffoldGenerator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

Scaffold_Generator_logo

Scaffold Generator

A CDK-based library for generating Scaffold Trees and Scaffold Networks
⚠️DISCLAIMER⚠️: This repository contains legacy code! The project has been moved and is now available and maintained as the cdk-scaffold module.
The GraphStream-based visualisation functionalities are available in a separate repository: https://github.com/JonasSchaub/scaffold-graph-vis
Some of the cdk-scaffold functionalities are also implemented in the MORTAR (MOlecule fRagmenTAtion fRamework) rich client Graphical User Interface (GUI) application (GitHub repository | article)

Description

The Scaffold Generator library is designed to make molecular scaffold-related functionalities available in applications and workflows based on the Chemistry Development Kit (CDK). Building mainly upon the works by Bemis and Murcko, Schuffenhauer et al., and Varin et al., it offers scaffold perception and dissection based on single molecules and molecule collections. From the latter, Scaffold Trees and Scaffold Networks can be constructed, represented in data structures, and visualised using the GraphStream library. Multiple options to fine-tune and adapt the routines are available.
A scientific article describing the library has been published and is available here: https://doi.org/10.1186/s13321-022-00656-x
Scaffold Generator is also available in the open Java rich client application MORTAR ('MOlecule fRagmenTation fRamework') where in silico molecule fragmentation can be easily conducted on a given data set and the results visualised (MORTAR GitHub repository, MORTAR article preprint).

Contents of this repository

Sources

The ScaffoldGenerator\src\main\java\ folder contains the Java source classes of Scaffold Generator. The class ScaffoldGenerator is the core class of the library making its main functionalities available through convenient, high-level methods. Other classes are used e.g. to represent data structures like Scaffold Trees and Scaffold Networks.

Tests

The test class ScaffoldGeneratorTest illustrates and tests the functionalities of Scaffold Generator; the correct output of its basic methods like scaffold generation, the more advanced functions to build Scaffold Trees and Scaffold Networks, the correct application of Schuffenhauer et al.'s prioritization rules (based on the schemata given in their publication), and the correct workings of the available settings and options. Some examples of Scaffold Trees and Scaffold Networks are displayed for visual inspection using the GraphStream library and examples for the basic functionalities are visualised using example molecules imported from the resource folder (see below) and saved as image files in an output folder. Two examples for the GraphStream visualisation of Scaffold Trees and Networks can be found in the GraphStreamFigures folder.
Additionally, performance tests are included that apply specific routines of Scaffold Generator to the whole COCONUT database.

Test resources

The test resources folder at path src\test\resources\ contains MDL MOL files of 23 test molecules used to illustrate the basic functionalities of Scaffold Generator. They are imported in multiple test methods and the results saved as image files in respective molecule-specific output folders.
An SD file of the COCONUT database to run the performance tests, is not included in the repository (see below).
All molecules used in the test methods imported from SMILES codes are also compiled in a separate file named SGTest_SMILES.txt in the ScaffoldGenerator folder.

Performance Test CMD Application

The folder ScaffoldGenerator\PerformanceTestCMDApp contains the executable JAVA archive ScaffoldGenerator-jar-with-dependencies.jar. It can be executed from the command-line (command: java -jar) to do a performance snapshot of Scaffold Generator's scaling behaviour for a growing number of input molecules. It requires two command-line arguments:

  • file name of an SDF located in the same directory as the JAR (not given)
  • integer number specifying into how many equally-sized bins the data set should be split in the analysis.

Example usage: java -jar ScaffoldGenerator-jar-with-dependencies.jar input-file-in-same-dir-name.sdf 10
The CMD application will then import the data set, split it into the given number of equally sized bins, create Scaffold Trees and Scaffold Networks for an increasing combination of those structure bins, and create detailed output files of the measured runtimes.
The source code of the CMD application can be found in the src folder with the other sources.

Installation

This is a Maven project. In order to use the source code for your own software, download or clone the repository and open it in a Maven-supporting IDE (e.g. IntelliJ) as a Maven project and execute the pom.xml file. Maven will then take care of installing all dependencies. A Java Development Kit (JDK) of version 17 or higher must also be pre-installed.
To run the COCONUT-analysing tests, an SD file of the database needs to be placed in the test "resources" folder at path src\test\resources\COCONUT_DB.sdf. The respective file can be downloaded at https://coconut.naturalproducts.net/download.

Dependencies

Needs to be pre-installed:

Managed by Maven:

References and useful links

Conceptual Scaffold, Scaffold Tree, and Scaffold Network articles

Chemistry Development Kit (CDK)

GraphStream

COlleCtion of Open NatUral producTs (COCONUT)

About

A CDK-based library for generating Scaffold Trees and Scaffold Networks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 100.0%