What governs "similarity" between inputs with respect to training? #359

vputz · 2023-08-08T13:55:31Z

vputz
Aug 8, 2023

Not sure how to phrase this, but are there any heuristics about what sorts of molecules nequip (and by extension allegro) regards as "similar" for purposes of "if I train on molecules from set A, the trained network will be reasonably effective at predicting molecules from set B because A and B are similar, but poor at predicting molecules from set C because it is dissimilar"?

I would guess that certainly "uses the same set of elements" would be in there, but I'm not sure about other things like number of bonds, density, number of atoms, some sort of measures of internal structure... we've been poking about a bit with different groups from eg the SPICE dataset and it's hard to see patterns yet, so I thought I'd at least have a go asking!

Linux-cpp-lisp · 2023-08-09T19:09:57Z

Linux-cpp-lisp
Aug 9, 2023
Maintainer

Hi @vputz ,

I can't really give you a technical answer, but a decent heuristic (especially for Allegro), is "are all local environments similar to local environments in the training set?" That's still pretty fuzzy but can give you a sense. "Same set of elements" is definitely in there. It's an interesting a tricky question that in some ways gets at the core of the "what is and is not important in this high dimensional space" that much of ML tries to answer. A similar line of thinking is "how to quantify uncertainty in MLFF predictions", since that is fairly similar to "how far are you from the training set" at a conceptual level. You can see some of the recent work we've done using distances in feature space here: https://pubs.aip.org/aip/jcp/article/158/16/164111/2886901. If you find anything interesting along this direction I'd be curious, please always feel free to reach out by email (see profile) as well.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What governs "similarity" between inputs with respect to training? #359

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

What governs "similarity" between inputs with respect to training? #359

vputz Aug 8, 2023

Replies: 1 comment

Linux-cpp-lisp Aug 9, 2023 Maintainer

vputz
Aug 8, 2023

Linux-cpp-lisp
Aug 9, 2023
Maintainer