Skip to content

A library to generate fingerprints for molecular structures based on a set of fragments

License

Notifications You must be signed in to change notification settings

Steinbeck-Lab/FragmentFingerprints

 
 

Repository files navigation

DOI Javadoc License: MIT Maintenance build GitHub issues GitHub contributors GitHub release Maven Central Quality Gate Status

FragmentFingerprints

A library to generate fingerprints for molecular structures based on a set of fragments

Description

The library generates fragment fingerprints based on pre-defined fragments, which can be set by the user, and can generate both bit and count fingerprints. Fragment fingerprints are created by matching fragments or substructures of a molecule with pre-defined fragments. If a match is found, the corresponding positions in the fingerprint are filled. The special feature of the fragment fingerprinter is that fingerprints are generated exclusively by comparing unique SMILES (Strings). This means that both the pre-defined fragments and the substructures or fragments of the molecule for which the fingerprint is being generated must be represented as unique SMILES strings. The implementation of the fragment fingerprinter is based on the Chemistry Development Kit (CDK).

Contents of this repository

Sources

The "src" subfolder contains all source code packages including JUnit tests.

Tests

The test class FragmentFingerprinterTest tests the functionalities of fragment fingerprinter. Among other things, it tests whether the bit and count fingerprint of a molecule has been generated correctly. Furthermore, various methods of the CountFingerprint and BitSetFingerprint classes are tested.

Test resources

The test "resources" subfolder contains two text files. The text file named "FragmentList.txt" contains all key fragments. And the file named "MoleculeList.txt" contains fragments/substructures of molecules. In total, 10 molecules with their corresponding fragments are stored in the file.

Performance Test CMD Application

The folder "PerformanceTestCMDApplication" contains the executable JAVA archive FragmentFingerprints-fat.jar. It can be executed from the command-line (command: java -jar) to do a performance snapshot of fragment fingerprinter's scaling behaviour for a growing number of input molecules. For more details see the file "Performance_test_instruction.txt"

Example initialization and usage of the FragmentFingerprinter

see in "wiki"

Installation

FragmentFingerprints is hosted as a package/artifact on the sonatype maven central repository. See the artifact page for installation guidelines using build tools like maven or gradle.
To install FragmentFingerprints via its JAR archive, you can get it from the releases. Note that other dependencies will need to be installed via JAR archives as well this way.
In order to open the project locally, e.g. to extend it, download or clone the repository and open it in a Gradle-supporting IDE (e.g. IntelliJ) as a Gradle project and execute the build.gradle file. Gradle will then take care of installing all dependencies. A Java Development Kit (JDK) of version 17 or higher must also be pre-installed.

Dependencies

Needs to be pre-installed:

Managed by Gradle:

References and useful links

Chemistry Development Kit (CDK)

About

A library to generate fingerprints for molecular structures based on a set of fragments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 100.0%