Replies: 1 comment
-
Hi @vputz , I can't really give you a technical answer, but a decent heuristic (especially for Allegro), is "are all local environments similar to local environments in the training set?" That's still pretty fuzzy but can give you a sense. "Same set of elements" is definitely in there. It's an interesting a tricky question that in some ways gets at the core of the "what is and is not important in this high dimensional space" that much of ML tries to answer. A similar line of thinking is "how to quantify uncertainty in MLFF predictions", since that is fairly similar to "how far are you from the training set" at a conceptual level. You can see some of the recent work we've done using distances in feature space here: https://pubs.aip.org/aip/jcp/article/158/16/164111/2886901. If you find anything interesting along this direction I'd be curious, please always feel free to reach out by email (see profile) as well. |
Beta Was this translation helpful? Give feedback.
-
Not sure how to phrase this, but are there any heuristics about what sorts of molecules nequip (and by extension allegro) regards as "similar" for purposes of "if I train on molecules from set A, the trained network will be reasonably effective at predicting molecules from set B because A and B are similar, but poor at predicting molecules from set C because it is dissimilar"?
I would guess that certainly "uses the same set of elements" would be in there, but I'm not sure about other things like number of bonds, density, number of atoms, some sort of measures of internal structure... we've been poking about a bit with different groups from eg the SPICE dataset and it's hard to see patterns yet, so I thought I'd at least have a go asking!
Beta Was this translation helpful? Give feedback.
All reactions