Skip to content

Latest commit

 

History

History

3.active_learning_2

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Fit reactive Neuroevolution Potentials (NEPs) of an elementary reaction within an explicit water box.

In this tutorial we will try to fit reactive Neuroevolution Potentials (NEPs) for the symmetric $CH_3Cl + Cl^- \rightarrow Cl^- + CH_3Cl$ reaction within an explicit water box, by the mean of the active learning methodology. By finishing this tutorial, you may get some sense about how to fit NEPs for real world systems (especially, aqueous solution systems). Before starting this tutorial, you are suggested to finish our first tutorial and read this article.

Pre-requirements

The pseudopotential files

Download pseudopotential files of hydrogen, carbon, oxygen, sodium and chlorine from here. Then copy the five psf files to the data directory (or you could use the provided pseudopotential files).

Step 1. Set up the simulated system.

Since we are going to simulate the symmetric substitution reaction of a methyl chloride molecule in explicit solvents, obviously we should set up a water box around at least a methyl chloride molecule and a chloride anion. However, simulating charged PBC systems is generally an extremely bad idea, because the redundant charges are very hard to be screened by the solvents. As a result, we must add counter ions into our water box. Here I chose to add one sodium ion to the system (because the cursed pseudopotential file of potassium provided by the SIESTA team generates ghost states under my setups, although increasing the core radius seems to get rid of them). And after taking into account the computational cost, our system was built with 45 water molecules, a methyl chloride molecule and a sodium chloride ion pair. The density of the system was set to about 1.0 g/mL, which lead to a cubic simulation box with a side length of 11.5 Å. The final structure of our system is stored in the topo.pdb file under the data directory. This is a snapshot of the system:

snapshot

Step 2. Generate initial training set.

Before starting the active learning process, an initial training set should be established. This training set will be invoked to train a set of initial NEPs during the first iteration of the active learning process. Thus, conformations in this initial training set should at least cover the whole reaction path. But since the energy barrier of this reaction is very high compared with the thermodynamics energy, the reaction could be very unlikely to happen during unbiased MD runs. To overcome this difficulty, we could perform a steered MD simulation under a little higher temperature (350 K). In this steered MD run, the introduced biasing potential (a moving restraint in the language of PLUMED) will act on the well-defined reaction coordinate of this simple reaction (the distance difference between the two $Cl-C$ bonds), and drive the system to cross the energy barrier.

Since the nep package reads the extended XYZ (EXYZ) files as its training sets and testing sets, you should dump the simulation trajectory in this format. You should also tell SOMD to record atomic forces in the trajectory file. But note that you should avoid also recording the biasing forces in this trajectory file. This could be done by defining the potential_list key in the [[trajectory]] table. Besides, we also defined the energy_shift key here. This key means that when saving potential energies to the trajectory, SOMD will subtract this value from the energies. Such a measure is applied because nep trains potentials under single precision, and potential energies with absolute values larger than 100.0 eV will damage the training accuracy. Thus, we set the value of this key to the approximate potential energy of our initial conformation (in unit of kJ/mol).

Under the above settings, your input file should look like:

[system]
        structure = "../data/topo.pdb"
[[group]]
        atom_list = "all"
        initial_temperature = 350.0
[[potential]] # Potential No. 0
        type = "SIESTA"
        siesta_options = """
        xc.functional          GGA
        xc.authors             revPBE

        PAO.BasisSize          TZP
        Mesh.Cutoff            300 Ry
        PAO.EnergyShift        10 meV
        PAO.SoftDefault        T

        SCF.Mixer.Variant      Pulay
        SCF.Mixer.History      8
        DM.NumberPulay         8
        DM.Tolerance           1.d-6
        DM.UseSaveDM           T
        DM.History.Depth       5

        SolutionMethod         diagon
        ElectronicTemperature  1 meV
        # If you do not have ELPA installed, comment out this line:
        Diag.Algorithm         ELPA-1stage
        """
        siesta_command = "mpirun -np 140 /path/to/siesta"
        pseudopotential_dir = "../data"
[[potential]] # Potential No. 1
        type = "DFTD3"
        functional = "revpbe"
[[potential]] # Potential No. 2
        type = "PLUMED"
        file_name = "../data/plumed.1.inp"
[integrator]
        type = "OBABO"
        timestep = 0.001
        temperatures = 350.0
        relaxation_times = 0.1
[[trajectory]]
        format = "EXYZ"
        write_forces = true
        potential_list = [0, 1] # This is the *MOST* important key!
        energy_shift = -2199303.769
        wrap_positions = true
        interval = 1
[run]
        n_steps = 14000

plumed.1.inp:

# vim:ft=plumed
UNITS LENGTH=A TIME=fs
FLUSH STRIDE=1
DISTANCE ATOMS=136,141 LABEL=d1
DISTANCE ATOMS=136,140 LABEL=d2
CUSTOM ARG=d1,d2 FUNC=x-y PERIODIC=NO LABEL=dd
MOVINGRESTRAINT ARG=dd ...
    LABEL=m1
    STEP0=0   AT0=2.0 KAPPA0=250.0
    STEP1=400 AT1=1.8
    STEP2=800 AT2=1.6
    STEP3=1200 AT3=1.4
    STEP4=1600 AT4=1.2
    STEP5=2000 AT5=1.0
    STEP6=2400 AT6=0.8
    STEP7=2800 AT7=0.6
    STEP8=3200 AT8=0.4
    STEP9=3600 AT9=0.2 KAPPA9=300.0
    STEP10=4000 AT10=0.0
    STEP11=4400 AT11=-0.2
    STEP12=4800 AT12=-0.4 KAPPA12=250.0
    STEP13=5200 AT13=-0.6
    STEP14=5600 AT14=-0.8
    STEP15=6000 AT15=-1.0
    STEP16=6400 AT16=-1.2
    STEP17=6800 AT17=-1.4
    STEP18=7200 AT18=-1.4
    STEP19=7600 AT19=-1.6
    STEP20=8000 AT20=-1.8
    STEP21=8400 AT21=-2.0
    STEP22=8800 AT22=-2.2
    STEP23=9200 AT23=-2.4
    STEP24=9600 AT24=-2.4 KAPPA24=0.0
...
LOWER_WALLS ARG=dd KAPPA=200.0 AT=-8.0 LABEL=w1
UPPER_WALLS ARG=dd KAPPA=200.0 AT=8.0 LABEL=w2
PRINT FILE=colvar ARG=*

Remember to change the value of the siesta_command key to the working SIESTA command.

Now you could enter the init directory and run the initial simulation:

cd init
somd -i init.toml

After finishing this task, you will find the trajectory file (init.trajectory.xyz) under the directory. Apparently this is a large trajectory with 14000 frames, and using all these conformations to train the initial NEPs will be a huge waste of computational resources (because many frames in this trajectory could be very similar, it will be no good to add all of them to the training set). Thus, we could select some frames from the trajectory and use them as the training set. But how we could perform the selection? First we plot the change of the reaction coordinate with respect to simulation time:

RC

By observing the curve, we may classify the whole trajectory into four stages:

  • 0 to 3000 fs: the pre-transition state (pre-TS) region;
  • 3000 to 5000 fs: the TS region;
  • 5000 to 10000 fs: the post-TS region;
  • 10000 to 14000 fs: the post-TS region without any biasing potential.

Bearing this information in mind, we could randomly select 75 frames from each stage and use them as the initial training set. This could be done by invoking the make_sets.sh script:

make_sets.sh:

#!/bin/sh

rm -f initial_training_set.xyz
python ../data/split.py \
       -i init.trajectory.xyz \
       --ranges "0:3500,3500:4500,4500:10000,10000:14000" \
       --nframes 75 \
       --perrange >> initial_training_set.xyz

After invoking this script, you will find the initial_training_set.xyz file under the init directory (containing 300 frames in total). This file will be used as the initial training set of the active learning process.

Step 3. Perform the training step 1.

Here we run four iterations of active learning with the same setups as above (including the biasing potentials). Input file of this task could be simply modified from the previous input file (init/init.toml) by adding one [active_learning] table to it:

[active_learning]
        n_iterations = 3
        n_potentials = 4
        msd_lower_limit = 200.0
        msd_upper_limit = 400.0
        max_md_runs_per_iter = 5
        max_md_steps_per_iter = 8000
        min_new_structures_per_iter = 20
        max_new_structures_per_iter = 100
        initial_training_set = "../init/initial_training_set.xyz"
        nep_options = """
        n_max      4 4
        cutoff     8 6
        neuron     10
        lambda_v   0.0
        batch      1000
        generation 50000
        """
        nep_command = "/path/to/nep"
        energy_shift = -2199303.769

The above settings mean:

  • We will perform three iterations of training in total.
  • In each training iteration:
    • Four different NEPs will be trained.
    • Five MD runs will be performed to collective conformations.
    • The length of the MD runs is 8000.
    • At least 20 and at most 100 new structures will be accepted.
  • The force MSD's lower limit of acceptable new structures is 200 kJ/mol/nm (about 2 eV/Å), and the force MSD's upper limit of acceptable new structures is 400 kJ/mol/nm (about 4 eV/Å). You may have found that these boundary values are much higher the ones we used in the first tutorial. As suggested in this article, these values are optimal for aqueous solution systems.
  • The number of NEP training steps is 50000, and the batch size is 1000.
  • Weights of the virial are zero. Note, since we have set the energy_shift key when generating the initial training set, we should set this key in the [active_learning] table to the same value we used there.

Note that you should change the value of the nep_command key to the working nep binary. You could also use a job manager like SLURM to submit the training job, in case your CPU and GPU nodes are separated. For example (the --wait parameter is REQUIRED):

nep_command = "/path/to/sbatch --wait /absolute/path/to/submit_nep.sh"

submit_nep.sh:

#!/bin/bash
#SBATCH -J training
#SBATCH -o training.log
#SBATCH -e training.err
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH --cpus-per-task=2
CUDA_VISIBLE_DEVICES=0,1 /path/to/nep

Now you could enter the training_step_1 directory and run the training:

cd training_step_1
somd -i training.toml

After finishing this task, you will find the trained NEPs in the training.active_learning.dir/iteration_3 directory:

training.active_learning.dir/iteration_3/potential_0/nep.txt
training.active_learning.dir/iteration_3/potential_1/nep.txt
training.active_learning.dir/iteration_3/potential_2/nep.txt
training.active_learning.dir/iteration_3/potential_3/nep.txt

After finishing this task, you will find the two new items under you working directory: the training.active_learning.h5 file and the training.active_learning.dir directory. This hdf5 file records the training information, including the training progress, structure numbers, force MSD, etc. Details about this file could be found here. You could navigate this file, for example, to view the number of failed, candidate and accurate structures in the last iteration of the training:

>>> import h5py as h5
>>> g = h5.File('./training.active_learning.h5')['/iteration_data']
>>> n_f = g['3']['n_failed_structures'][0]
>>> n_c = g['3']['n_candidate_structures'][0]
>>> n_a = g['3']['n_accurate_structures'][0]
>>> print('number of failed structures: ', n_f)
# number of failed structures:  0
>>> print('number of candidate structures: ', n_c)
# number of candidate structures:  392
>>> print('number of accurate structures: ', n_a)
# number of accurate structures:  39608

However, during these training iterations, the simulations are all driven by the strong moving restraints. As a result, the visited conformation space will be liminal. Thus, we need to perform more active learning iterations to obtain robust NEPs.

Step 4. Perform the training step 2.

During this step, we will change the type of the biasing potentials and perform another four iterations of active learning. To this end, we first change the invoked PLUMED file to plumed.2.inp:

[[potential]]
        type = "PLUMED"
        file_name = "../data/plumed.2.inp"

plumed.2.inp:

# vim:ft=plumed
UNITS LENGTH=A TIME=fs
FLUSH STRIDE=200
DISTANCE ATOMS=136,141 LABEL=d1
DISTANCE ATOMS=136,140 LABEL=d2
CUSTOM ARG=d1,d2 FUNC=x-y PERIODIC=NO LABEL=dd
LOWER_WALLS ARG=dd KAPPA=200.0 AT=-5.0 LABEL=w1
UPPER_WALLS ARG=dd KAPPA=200.0 AT=5.0 LABEL=w2
METAD ARG=dd ...
    PACE=10
    HEIGHT=1.0
    BIASFACTOR=40
    SIGMA=0.1
    FILE=HILLS
    GRID_MIN=-8
    GRID_MAX=8
    GRID_BIN=800
    CALC_RCT
    LABEL=m1
...
PRINT FILE=colvar STRIDE=50 ARG=*

Here we defined a metadynamics simulation, which is more "gentle" than the steered MD. Then, we increase the MD simulation time to 250000 steps and reduce the number of MD runs to 1 to sample a broader conformation space:

[active_learning]
        n_iterations = 6
        n_potentials = 4
        msd_lower_limit = 200.0
        msd_upper_limit = 400.0
        max_md_runs_per_iter = 4
        max_md_steps_per_iter = 50000
        min_new_structures_per_iter = 20
        max_new_structures_per_iter = 50
        initial_training_set = "../training_step_1/training.active_learning.dir/iteration_3/train.xyz"
        initial_potential_files = [
                "../training_step_1/training.active_learning.dir/iteration_3/potential_0/nep.txt",
                "../training_step_1/training.active_learning.dir/iteration_3/potential_1/nep.txt",
                "../training_step_1/training.active_learning.dir/iteration_3/potential_2/nep.txt",
                "../training_step_1/training.active_learning.dir/iteration_3/potential_3/nep.txt",
        ]
        nep_options = """
        n_max      4 4
        cutoff     8 6
        neuron     10
        lambda_v   0.0
        batch      1000
        generation 60000
        """
        nep_command = "/path/to/nep"
        energy_shift = -2199303.769

Note, since we are "restarting" the training based on the previous trained potentials in training_step_1, we should properly define the initial_potential_files key, which contains path of the initial NEPs. Besides, the initial_training_set should be modified as well, since the training set has been cumulated.

After modifying the value of the nep_command key, you could enter the training_step_2 directory and run the training:

cd training_step_2
somd -i training.toml

After finishing this task, you will find the trained NEPs in the training.active_learning.dir/iteration_6 directory as usual. Here we plot the number of failed, candidate and accurate structures with respect to the total training iteration number:

number

From the above figure, we could find out that the number of failed and candidate structures may not decrease monotonically during the active learning. As a result, using more training iterations with a relatively small max_new_structures_per_iter parameter could usually increase the convergence and stability of the training.

Step 5. Perform the final training.

We have already obtained a much more robust training set with 900 structures (the training_step_2/training.active_learning.dir/iteration_6/train.xyz file) from the step 4. Now we could use this final training set to train a better converged NEP. The nep.in file in the final_training directory was prepared for this task:

type 5 C Cl H Na O

n_max      8 8
cutoff     8 6
neuron     40
lambda_v   0.0
batch      1000
generation 200000

Here we will use a larger NN size and run for more training steps to increase the training convergence. To perform the finial training, you should invoke the copy_set.sh script first to copy the training and testing sets. Then you could submit a regular NEP training task under this directory by invoking nep directly. For example:

cd final_training
CUDA_VISIBLE_DEVICES=0,1 /path/to/nep

After the training, the final NEP (the nep.txt file) will appear under the directory (or you may want to take a look at the NEP trained by me under the results directory). We could also check some training data to ensure the convergence (namely, the loss.out, force_train.out and energy_train.out files):

training

Perform the production run.

Finally, we could use the trained NEP to perform production tasks. Here we will try to calculate the free energy barrier of the reaction using PLUMED and GPUMD. To this end, we will run a 20 ns long metadynamics simulation under 300 K, and this is the run.in file:

potential     ../final_training/nep.txt

velocity      300
ensemble      nvt_bao 300 300 100
plumed        ../data/plumed.3.inp 1 0
time_step     0.5
dump_thermo   10000
dump_position 10000
run           40000000

To lower the errors introduced by the biasing potentials, we will decrease the deployment frequency of the Gaussian potentials. Thus, another PLUMED file (plumed.3.inp) will be used here:

plumed.3.inp:

# vim:ft=plumed
UNITS LENGTH=A TIME=fs
FLUSH STRIDE=5000
DISTANCE ATOMS=136,141 LABEL=d1
DISTANCE ATOMS=136,140 LABEL=d2
CUSTOM ARG=d1,d2 FUNC=x-y PERIODIC=NO LABEL=dd
LOWER_WALLS ARG=dd KAPPA=250.0 AT=-5.0 LABEL=w1
UPPER_WALLS ARG=dd KAPPA=250.0 AT=5.0 LABEL=w2
METAD ARG=dd ...
    PACE=250
    HEIGHT=1.0
    BIASFACTOR=50
    SIGMA=0.15
    FILE=HILLS
    GRID_MIN=-8
    GRID_MAX=8
    GRID_BIN=800
    CALC_RCT
    LABEL=m1
...
PRINT FILE=colvar STRIDE=250 ARG=*

You could now enter the production directory and submit a regular GPUMD task. For example:

cd production
CUDA_VISIBLE_DEVICES=0 /path/to/gpumd

(Note, you should compile your GPUMD binary with the -DUSE_PLUMED option on.) After the calculation, invoke the barrier.sh script to get the barrier height value. Using the NEP trained by me, I got a barrier of about 100.2 kJ/mol:

./barrier.sh
# BARRIER HEIGHT: 102.982626 +- 5.084623 (kJ/mol)

Since the experimental value is about 111 kJ/mol, we could say the result is acceptable. The errors come from the following aspects:

  • The simulation box is small. Larger boxes should give better results.
  • We were using the pure functional, and hybrid functionals could largely improve the accuracy.
  • The pseudopotentials of SIESTA are not optimal.

Acknowledgment

Thanks to Dr. Fan for much meaningful guidance.