Force error evaluation #271

jungsdao · 2023-11-29T22:41:02Z

While I was comparing different force convergence, I realized that even for the same DFT & Machine-learning potential evaluated data, there could be difference force error depending on how force error is calculated. The way implemented in wfl package calculates norm of atomic forces. But what I have used before is element-wise (x1, y1, z1) comparison.
In order to make force RMSE/MAE comparable within different projects, It would better to use consistent way of force evaluation method.
I wonder if calculating norm of force is better way of evaluating force error.

from sklearn.metrics import mean_squared_error as rmse
from sklearn.metrics import mean_absolute_error as mae

## WFL
all_diffs = []
for atoms in calc_in_config:   
    calc_quant = atoms.arrays.get("calc_forces")
    ref_quant = atoms.arrays.get("DFT_forces")
     
    calc_quant = np.asarray(calc_quant).reshape(len(atoms), -1)
    ref_quant = np.asarray(ref_quant).reshape(len(atoms), -1)
 
    diff = calc_quant - ref_quant
    diff = np.linalg.norm(diff, axis=1)
    all_diffs.append(diff)
     
all_diffs = np.asarray(all_diffs).flatten()
weights = np.ones(len(all_diffs))
RMSE = np.sqrt(np.sum((all_diffs ** 2) * weights) / np.sum(weights))
 
## method 1
F_mace = np.asarray([atoms.arrays["calc_forces"] for atoms in calc_in_config]).flatten()
F_dft = np.asarray([atoms.arrays["DFT_forces"] for atoms in calc_in_config]).flatten()
RMSD = rmse(F_dft, F_mace, squared=False)
MAE = mae(F_dft, F_mace)
 
## method 2 
F_mace = [atoms.arrays["calc_forces"].flatten() for atoms in calc_in_config]
F_dft = [atoms.arrays["DFT_forces"].flatten() for atoms in calc_in_config]
RMSD = rmse(F_dft, F_mace, squared=False)
MAE = mae(F_dft, F_mace)

The text was updated successfully, but these errors were encountered:

bernstei · 2023-11-30T01:11:04Z

Do these actually differ by anything other than a factor of sqrt(3)?

jungsdao · 2023-11-30T15:21:15Z

RMSE of "wfl" and "method 1" differ by factor of sqrt(3), but other than those two don't scale by constant number.

bernstei · 2023-11-30T15:45:12Z

I agree that the current wfl code seems very complex, and can perhaps be refactored. I don't have strong feelings about the sqrt(3) factor, but I still like it, because I attach physical meaning to the force on an atom, not a force component. I do, however, definitely want to keep the ability to have weighted sums, and to have them broken down by config_type and other arbitrary categories.

I also don't understand what method 2 is doing. From the docs sklearn.metrics.mean_squared_error is supposed to be passed an array-like with the rows being samples and columns being different output channels. If the number of atoms in each config is different, the list that's passed in is a, not array-like (it's ragged), and b, it doesn't make sense to treat the columns (which force component in the flattened list of a single config) as different outputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force error evaluation #271

Force error evaluation #271

jungsdao commented Nov 29, 2023

bernstei commented Nov 30, 2023 via email

jungsdao commented Nov 30, 2023

bernstei commented Nov 30, 2023

Force error evaluation #271

Force error evaluation #271

Comments

jungsdao commented Nov 29, 2023

bernstei commented Nov 30, 2023 via email

jungsdao commented Nov 30, 2023

bernstei commented Nov 30, 2023