Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(algo): support guessing atom/bond properties from geometry #189

Merged
merged 49 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
cfd7b5f
refactor(core): move ring finding algorithms to new directory algo/
jnooree Dec 6, 2023
13a249c
chore: allow short statements without surrounding braces
jnooree Dec 8, 2023
5b275b1
refactor(core): introduce swap_axis() to use Eigen's col-major APIs
jnooree Dec 8, 2023
ea1112a
fix(core): update ownership of substructures on copy/move of Molecule
jnooree Dec 8, 2023
a9998cb
perf(core): re-map node ids of substructures directly
jnooree Dec 8, 2023
c765411
refactor(core): extract conformer remapping logic
jnooree Dec 8, 2023
3c37c78
fix(core): copy/move id/props while copy/moving substructures
jnooree Dec 8, 2023
9d4c4f6
refactor(core): rename erase_*_of -> erase_*_if
jnooree Dec 8, 2023
b73509c
perf(core): skip finalize if graph itself is not changed
jnooree Dec 8, 2023
b5bfe11
docs(core): correct docstrings to reflect behavior change of mutator
jnooree Dec 8, 2023
d4df60d
feat(core): promote ClearablePQ to internal API
jnooree Dec 11, 2023
d2c3825
feat: add more Eigen-like type aliases
jnooree Dec 11, 2023
db844e6
feat(core): implement octree-based nearest neighbor search
jnooree Dec 11, 2023
cbefae0
test(core): add tests for octree nn search
jnooree Dec 11, 2023
f066288
fix: add missing headers
jnooree Dec 11, 2023
a92e0ee
feat(core): add element type and standard state
jnooree Dec 11, 2023
69a4e61
feat(core): disallow negative implicit hydrogen count
jnooree Dec 11, 2023
8e64a89
feat(core): remove bond length from BondData
jnooree Dec 11, 2023
520407b
build(test): create nuri_add_test function
jnooree Dec 11, 2023
4e2ab5b
feat(core): support direct indexing on neighbors of NodeWrapper
jnooree Dec 11, 2023
b61b56f
feat(core): add cconf() method to Molecule
jnooree Dec 11, 2023
869bc19
feat(core): allow mutating substructures of molecule
jnooree Dec 11, 2023
893b7e7
feat(core): add static reference to the periodic table for convenience
jnooree Dec 11, 2023
c9b6513
feat(core): promote nonbonding_electrons to public API
jnooree Dec 11, 2023
a6bbf52
feat: add CompactMap for bounded integer-like keys
jnooree Dec 11, 2023
4c4edc5
chore: update clang-tidy config
jnooree Feb 5, 2024
f7ba9b4
feat(core): add primitives for angle calculation
jnooree Feb 5, 2024
6d4abc2
test(core): add test for fit_plane()
jnooree Feb 6, 2024
0c1279f
feat: rename add_if -> value_if to support broader applications
jnooree Feb 5, 2024
2000fbb
feat(core): add "other" bond order for flexibility
jnooree Feb 5, 2024
592c314
feat(core): make most set_* methods of {Atom,Bond}Data return itself
jnooree Feb 5, 2024
f3c3ec1
style(fmt/mol2): remove most braces around single-line blocks
jnooree Feb 5, 2024
641eae3
perf(core): use octant distance while octree traversal
jnooree Feb 6, 2024
84de569
feat(core): update molecule API for better usability
jnooree Feb 6, 2024
250eb2f
feat(core): remove valence warning in internal::count_pi_e function
jnooree Feb 6, 2024
6b199bd
feat(core): add steric_number function
jnooree Feb 6, 2024
fcf2028
feat(core): add find_adjacent method to NodeWrapper
jnooree Feb 6, 2024
d55fdc5
feat: add more eigen typedefs
jnooree Feb 6, 2024
f699ce9
feat: add cyclic indexer for eigen types
jnooree Feb 6, 2024
ed7bec0
feat: create bitmask based PowersetStream
jnooree Feb 6, 2024
b2a95a0
feat: create ZippedIterator utility
jnooree Feb 6, 2024
8b1965c
feat: add more utility functions
jnooree Feb 6, 2024
a6db762
feat: update signature of generate_index, argsort, argpartition
jnooree Feb 6, 2024
22b765a
perf: optimize stack() function
jnooree Feb 6, 2024
aa67dc9
feat(core): add utility function to use NodeWrapper as Eigen indexer
jnooree Feb 6, 2024
ae87516
feat(algo): implement guessing algorithm based on 3D atom coordinates
jnooree Feb 6, 2024
9831ac7
test(algo): add tests for guessing algorithms
jnooree Feb 6, 2024
f34a9a2
fix: update headers
jnooree Feb 6, 2024
fa81d12
feat(algo): remove unimplemented functions
jnooree Feb 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

# Require clang-tidy v15
---
Checks: "-*,abseil-*,boost-*,bugprone-*,clang-*,concurrency-*,cppcoreguidelines-*,google-*,misc-*,modernize-*,openmp-*,performance-*,portability-*,readability-*,-bugprone-assignment-in-if-condition,-bugprone-easily-swappable-parameters,-cppcoreguidelines-avoid-do-while,-cppcoreguidelines-avoid-c-arrays,-cppcoreguidelines-init-variables,-cppcoreguidelines-narrowing-conversions,-cppcoreguidelines-pro-*,-cppcoreguidelines-rvalue-reference-param-not-moved,-google-explicit-constructor,-misc-no-recursion,-modernize-avoid-c-arrays,-modernize-return-braced-init-list,-modernize-loop-convert,-modernize-concat-nested-namespaces,-modernize-pass-by-value,-modernize-raw-string-literal,-modernize-use-trailing-return-type,-modernize-use-auto,-modernize-use-default-member-init,-modernize-use-emplace,-modernize-use-equals-delete,-modernize-use-nodiscard,-readability-isolate-declaration,-readability-identifier-length,-readability-qualified-auto,-*-magic-numbers"
Checks: "-*,abseil-*,boost-*,bugprone-*,clang-*,concurrency-*,cppcoreguidelines-*,google-*,misc-*,modernize-*,openmp-*,performance-*,portability-*,readability-*,-bugprone-assignment-in-if-condition,-bugprone-easily-swappable-parameters,-cppcoreguidelines-avoid-do-while,-cppcoreguidelines-avoid-c-arrays,-cppcoreguidelines-init-variables,-cppcoreguidelines-narrowing-conversions,-cppcoreguidelines-pro-*,-cppcoreguidelines-rvalue-reference-param-not-moved,-cppcoreguidelines-use-default-member-init,-google-explicit-constructor,-misc-no-recursion,-modernize-avoid-c-arrays,-modernize-return-braced-init-list,-modernize-loop-convert,-modernize-concat-nested-namespaces,-modernize-pass-by-value,-modernize-raw-string-literal,-modernize-use-trailing-return-type,-modernize-use-auto,-modernize-use-default-member-init,-modernize-use-emplace,-modernize-use-equals-delete,-modernize-use-nodiscard,-readability-isolate-declaration,-readability-identifier-length,-readability-qualified-auto,-*-magic-numbers"
FormatStyle: "file"
# If these flags are updated, .clangd should be updated as well
ExtraArgsBefore:
Expand Down Expand Up @@ -52,6 +52,10 @@ CheckOptions:
value: google
- key: performance-unnecessary-value-param.IncludeStyle
value: google
- key: google-readability-braces-around-statements.ShortStatementLines
value: 3
- key: readability-braces-around-statements.ShortStatementLines
value: 3
# The following entries are (partailly) based on the Google C++ Style Guide.
# Namely, we use lower_case for function names, for better interoperability
# with the Python language.
Expand Down
24 changes: 24 additions & 0 deletions cmake/NurikitTest.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,27 @@ if(NOT TARGET nuri_all_test)
set_target_properties(nuri_all_test PROPERTIES EXCLUDE_FROM_ALL OFF)
endif()
endif()

include(GoogleTest)

function(nuri_add_test file)
get_filename_component(test_dir ${file} DIRECTORY)
file(RELATIVE_PATH test_dir "${PROJECT_SOURCE_DIR}/test" "${test_dir}")
string(REPLACE "/" "_" test_prefix ${test_dir})

get_filename_component(test_name ${file} NAME_WE)

set(target "nuri_${test_prefix}_${test_name}")
add_executable("${target}" "${file}")
target_link_libraries("${target}" PRIVATE
GTest::gtest GTest::gmock GTest::gtest_main
absl::absl_log absl::absl_check)

if(TARGET nuri_lib)
target_link_libraries("${target}" PRIVATE nuri_lib)
endif()

gtest_discover_tests("${target}"
WORKING_DIRECTORY "${PROJECT_SOURCE_DIR}")
add_dependencies(nuri_all_test "${target}")
endfunction()
44 changes: 44 additions & 0 deletions include/nuri/algo/guess.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
//
// Project nurikit - Copyright 2023 SNU Compbio Lab.
// SPDX-License-Identifier: Apache-2.0
//

#ifndef NURI_ALGO_GUESS_H_
#define NURI_ALGO_GUESS_H_

#include "nuri/core/molecule.h"

namespace nuri {
constexpr inline double kDefaultThreshold = 0.5;

/**
* @brief Guess bonds, types of atoms, and number of hydrogens of a molecule.
* @param mut The mutator of the molecule to be guessed.
* @param conf The index of the conformation used for guessing.
* @param threshold The threshold for guessing bonds.
* @return true if the guessing is successful.
*
* This function assumes all connectivity information is missing, and all atom
* types and implicit hydrogen counts are incorrect. The information present
* in the molecule could be overwritten by this function.
*
* If connectivity information is already present and is correct, consider using
* guess_all_types().
*/
extern bool guess_bonds(MoleculeMutator &mut, int conf = 0,
double threshold = kDefaultThreshold);

/**
* @brief Guess types of atoms and bonds, and number of hydrogens of a molecule.
* @param mol The molecule to be guessed.
* @param conf The index of the conformation used for guessing.
* @return true if the guessing is successful.
*
* This function assumes all connectivity information is present and correct,
* and all atom/bond types and implicit hydrogen counts are incorrect. The
* information present in the molecule could be overwritten by this function.
*/
extern bool guess_all_types(Molecule &mol, int conf = 0);
} // namespace nuri

#endif /* NURI_ALGO_GUESS_H_ */
139 changes: 139 additions & 0 deletions include/nuri/algo/rings.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
//
// Project nurikit - Copyright 2023 SNU Compbio Lab.
// SPDX-License-Identifier: Apache-2.0
//
#ifndef NURI_ALGO_RINGS_H_
#define NURI_ALGO_RINGS_H_

#include <memory>
#include <utility>
#include <vector>

#include "nuri/core/molecule.h"

namespace nuri {
using Rings = std::vector<std::vector<int>>;

/**
* @brief Find all elementary cycles in the molecular graph.
* @param mol A molecule.
* @return A pair of (all elementary cycles, success). If success is `false`,
* the vector is in an unspecified state. This will fail if and only if
* any atom is a member of more than 100 elementary cycles.
*
* This is based on the algorithm described in the following paper:
* Hanser, Th. *et al.* *J. Chem. Inf. Comput. Sci.* **1996**, *36* (6),
* 1146-1152. DOI: [10.1021/ci960322f](https://doi.org/10.1021/ci960322f)
*
* The time complexity of this function is inherently exponential, but it is
* expected to run in a reasonable time (\f$\sim\mathcal{O}(V^2)\f$) for most
* molecules in practice.
*/
extern std::pair<Rings, bool> find_all_rings(const Molecule &mol);

namespace internal {
struct FindRingsCommonData;
} // namespace internal

/**
* @brief Wrapper class of the common routines of find_sssr() and
* find_relevant_rings().
* @sa nuri::find_relevant_rings(), nuri::find_sssr()
*
* Formally, SSSR (smallest set of smallest rings) is a *minimum cycle basis*
* of the molecular graph. As discussed in many literatures, there is no unique
* SSSR for a given molecular graph (even for simple molecules such as
* 2-oxabicyclo[2.2.2]octane), and the SSSR is often counter-intuitive. For
* example, the SSSR of cubane (although unique, due to symmetry reasons)
* contains only five rings, which is not most chemists would expect.
*
* On the other hand, union of all SSSRs, sometimes called the *relevant
* rings* in the literatures, is unique for a given molecule, and is the "all
* smallest rings" of the molecule, chemically speaking. It is more appropriate
* for most applications than SSSR.
*
* We provide two functions along with this class to find the relevant rings and
* SSSR, respectively. If both are needed, it is recommended to construct this
* class first, and call find_relevant_rings() and find_sssr() member functions
* instead of calling the free functions directly.
*
* This is based on the algorithm described in the following paper:
* Vismara, P. *Electron. J. Comb.* **1997**, *4* (1), R9.
* DOI: [10.37236/1294](https://doi.org/10.37236/1294)
*
* Time complexity: theoretically \f$\mathcal{O}(\nu E^3)\f$, where \f$\nu =
* \mathcal{O}(E)\f$ is size of SSSR. For most molecules, however, this is
* \f$\mathcal{O}(V^3)\f$.
*/
class RingSetsFinder {
public:
/**
* @brief Construct a new Rings Finder object.
* @param mol A molecule.
*/
explicit RingSetsFinder(const Molecule &mol);

RingSetsFinder(const RingSetsFinder &) = delete;
RingSetsFinder &operator=(const RingSetsFinder &) = delete;
RingSetsFinder(RingSetsFinder &&) noexcept;
RingSetsFinder &operator=(RingSetsFinder &&) noexcept;

~RingSetsFinder() noexcept;

/**
* @brief Find the relevant rings of the molecule.
* @return The relevant rings of the molecule.
* @sa nuri::find_relevant_rings()
*/
Rings find_relevant_rings() const;

/**
* @brief Find the SSSR of the molecule.
* @return The smallest set of smallest rings (SSSR) of the molecule.
* @sa nuri::find_sssr()
* @note This function does not guarantee that the returned set is unique, nor
* that the result is reproducible even for the same molecule.
*/
Rings find_sssr() const;

private:
const Molecule *mol_;
std::unique_ptr<internal::FindRingsCommonData> data_;
};

/**
* @brief Find union of the all SSSRs in the molecular graph.
* @param mol A molecule.
* @return Union of the all SSSRs in the molecular graph.
* @sa find_sssr(), nuri::RingSetsFinder::find_relevant_rings()
*
* This is a convenience wrapper of the
* nuri::RingSetsFinder::find_relevant_rings() member function.
*
* @note If both relevant rings and SSSR are needed, it is recommended to use
* the nuri::RingSetsFinder class instead of the free functions.
*/
inline Rings find_relevant_rings(const Molecule &mol) {
return RingSetsFinder(mol).find_relevant_rings();
}

/**
* @brief Find a smallest set of smallest rings (SSSR) of the molecular graph.
* @param mol A molecule.
* @return *A* smallest set of smallest rings (SSSR) of the molecular graph.
* @sa find_relevant_rings(), nuri::RingSetsFinder::find_sssr()
* @note This function does not guarantee that the returned set is unique, nor
* that the result is reproducible even for the same molecule.
*
* This is a convenience wrapper of the nuri::RingSetsFinder::find_sssr() member
* function.
*
* @note If both relevant rings and SSSR are needed, it is recommended to use
* the nuri::RingSetsFinder class instead of the free functions.
*/
inline Rings find_sssr(const Molecule &mol) {
return RingSetsFinder(mol).find_sssr();
}
} // namespace nuri

#endif /* NURI_ALGO_RINGS_H_ */
52 changes: 52 additions & 0 deletions include/nuri/core/element.h
Original file line number Diff line number Diff line change
Expand Up @@ -132,9 +132,44 @@ constexpr inline bool operator!=(const Isotope &lhs,
*
* - Retrieved from the BODR (Blue Obelisk Data Repository):
* https://github.com/BlueObelisk/bodr/blob/29ce17071c71b2d4d5ee81a2a28f0407331f1624/bodr/elements/elements.xml
*
* @subsection type-state About the type and standard state data
*
* The element type and standard state data were obtained from the PubChem
* periodic table.
*
* The element type represents classification of the element into one of the
* following categories. The categories are mutually exclusive, and elements
* that do not fall into any of the categories are classified as metals.
*
* - Unknown (only dummy atoms),
* - Nonmetal,
* - Metalloid
*
* The standard state represents the state of the element at 298.15 K and 1 atm,
* and is one of gas, liquid, or solid. Unknown is only used for dummy atoms.
*
* @subsubsection type-state-ref References
*
* - National Center for Biotechnology Information. Periodic Table of Elements.
* https://pubchem.ncbi.nlm.nih.gov/periodic-table. (Accessed 2023-12-05)
*/
class Element {
public:
enum class Type : std::uint8_t {
kUnknown,
kMetal,
kMetalloid,
kNonmetal,
};

enum class State : std::uint8_t {
kUnknown,
kSolid,
kLiquid,
kGas,
};

Element() = delete;
~Element() noexcept = default;

Expand Down Expand Up @@ -198,6 +233,18 @@ class Element {
return internal::check_flag(flags_, ElementFlags::kActinide);
}

/**
* @brief Get the type of the element.
* @return The type of the element.
*/
constexpr Type type() const noexcept { return type_; }

/**
* @brief Get the standard state of the element.
* @return The standard state of the element.
*/
constexpr State state() const noexcept { return state_; }

/**
* @brief Get the IUPAC Symbol of the atom.
* @return The IUPAC Symbol.
Expand Down Expand Up @@ -299,6 +346,8 @@ class Element {
std::int16_t period_;
std::int16_t group_;
ElementFlags flags_;
Type type_;
State state_;
std::string_view symbol_;
std::string_view name_;
double atomic_weight_;
Expand Down Expand Up @@ -447,6 +496,9 @@ class PeriodicTable final {
absl::flat_hash_map<std::string_view, const Element *> symbol_to_element_;
absl::flat_hash_map<std::string_view, const Element *> name_to_element_;
};

// NOLINTNEXTLINE(readability-identifier-naming)
static const PeriodicTable &kPt = PeriodicTable::get();
} // namespace nuri

#endif /* NURI_CORE_ELEMENT_H_ */
Loading