Repository of selected Python 3 scripts used to aid data analysis and input generation of Monte Carlo and Configurational Bias Monte Carlo simulations performed with Dice.
To use the scripts a few Python 3 dependencies are needed, for example
The number of dependencies may vary depending on which analysis script one is using, those are just the most common used in almost all the scripts. You can install these dependencies in any way you want, however, we encourage the use of the Anaconda Python distribution. To install the libraries with Anaconda, do the following:
conda install numpy
conda install -c conda-forge openbabel
conda install matplotlib
pip install rmsd
If you want to use the graphical interface of DiceWin to perform some analysis, you will also need some stuff like:
that can also be easily installed with conda with
conda install scipy
conda install sip
conda install pyqt=5
conda install -c anaconda pandas
If you have any problem with the scripts that plots data with matplotlib, you may need to install the package cm-super
which contains some of the LaTeX libraries needed for the correct rendering of LaTeX with matplotlib.
All the scripts can be run with the -h
option to show a brief description of what the script does and the mandatory and optional parameters.
Receives the file name of a file containing data of a angle (or torsional angle) as one number per line, and an integer (number of bins) to give a file "pdf.dat" and a plot of the probability density function interpolated from the histogram.
Receives a file containing a molecular trajectory in any format supported by OpenBabel, and 3 integers (indexes of atoms) to compute the angle between the atoms and print it to screen.
Same as calculate_angles.py, but receives 4 integers defining a dihedral angle.
Given a .xyz file with a trajectory and a .txt DICE input to print the dipole moment for each configuration.
Receives a .dfr DICE input that had some fragment types changed to "R" (rigid) and simplifies the .dfr removing unecessary information related to the rigid degrees of freedom.
Receives a .dfr and a .txt to convert the DICE inputs to GROMACS inputs .gro and .top (with a separate .itp for the molecular topology). When running the script, you need to specify the force field, either opls or amber, in the command line. The force field name is used to select the combination rules and fudges correctly.
Graphical user interface that can open files generated by DICE to plot the evolution of properties with the simulation steps, plot all the radial distribution functions, calculate statistical correlation and more. The interface is very intuitive, but for more information you can see the manual (unfortunately, just in Portuguese at the moment).
Receives a file containing several angles (normally a dihedral angles) one in each line and an integer (usually the interval "isave" used in DICE input) to plot how this angle changed during a simulation.
Receives a trajectory and two integer (atom indexes) to plot the evolution of a distance with the simulation steps.
Receives a file containing a series of numbers, one in each line (usually the output of calculate_dihedrals.py), classifies all the values as belonging to a group in [min,max] or [min2,max2] or not and then estimate the variance, or how much the populations deviate from the others inside each one of the "nwindows" (integer parameter).
Receives a Gaussian's .log contaning calculations concerning the rotation around a rotatable bond (generated with plot_eff_tors), the .txt with the correct charges and LJ parameters and an incomplete .dfr (with bad parameters for the description of the torsions around the rotatable bond) to fit the torsional energy and generate a new .dfr. The script uses some chemical knowledge to attribute the same parameters for the same torsions. By default, the fit enforces the parametrization to pass through the minimums. There are a few options concerning the fit and the verbosity of the output.
The fragGen is an script used to generate the input for CBMC simulations with DICE. It receives a file containing the geometry for a molecule in any format supported by OpenBabel to generate the .dfr and .txt files. fragGen always generates the maximum fragmentation of the molecule, breaking the molecule into the rotatable bonds. After running fragGen, the user still needs to specify the force field parameters in the .dfr and .txt
Given a DICE .xyz trajectory and a configuration number (integer), extracts the configuration labeled with this configuration number from the trajectory and print to STDOUT. Useful, for example, to extract the whole configuration (considering the solvent) of a medoid of a cluster found with Clustering Trajectory.
Given a .xyz file and an integer representing the number of atoms, print the first "natoms" atoms for the molecule as a .xyz. Usually used to extract the solute configurations from the simulation boxes, with "natoms" being the number of atoms of the solute.
Receives a GROMACS topology file (.top or .itp) built using either OPLS-AA or an AMBER variation, and a file containing the geometry of the molecule (with the atoms in the same order) in .gro or any format supported by OpenBabel. The script automatically converts the input to the DICE format (.txt and .dfr) also generating the maximum fragmentation of the molecule. This script is particularly useful because one can use one of the several possible topology generator tools, like MKTOP, Antechamber or LigParGen and then convert the topology to the DICE format. Beware though that you MUST check your topology when using these tools, specially your dihedrals. Depending on the type of molecule, it is not unusual for these tools to get some dihedral energies VERY wrong, and it is your job to identify and correct them. To check the dihedrals, you can use, e.g., the plot_eff_tors.py script.
Receives a GROMACS generate pdb trajectory build with gmx trjconv, a DICE topology file (.txt) and the quantity of molecules by type separated by spaces. The script converts the trajectory to a DICE trajectory file (.xyz) so that it can be used by the program order for analysis. The script can also receive optional arguments to convert only ranges of the input file.
The plot_eff_tors is a script to find the rotational barrier between two fragments of a molecule. What is does is to divide the molecule in two parts around the chosen rotatable bond, keeping the parts as rigid bodies but rotating one of the sides around the rotatable bond. During this rotation, the energies are evaluated based on the given .dfr parameters and configurations at each step of the rotation can be stored. If the used wants, the script can also generate a Gaussian input based on a given .txt file containing the method, basis set and charge and multiplicity (as one usually have in the beginning of each Gaussian input). This input contains all the conformations of the rotation linked, and can be used to perform single point calculations and get the energy profile of the rotation. By comparing both the molecular mechanics and quantum energy profiles, the user may adjust the force field parameters to then perform the simulation.
This script receives a .log of the Gaussian calculation performed with the input of plot_eff_tors and then extracts the curve of dihedral angle vs energy.
Given a file containing a value per line, this script gives the probability of getting one value in a given interval. This is specially useful for computing, e.g., the number of cis configurations of a trajectory based on a list of dihedral angles.
Sometimes the LigParGen web server scrambles the atoms after running the parametrization. This script receives the original .pdb uploaded to LigParGen and the LigParGen outputs .gro and .itp to reorder these output to have the atoms in the same order of the uploaded .pdb.
Given a trajectory in .xyz, this script select a few configurations separated by an interval of steps and outputs them to STDOUT. This is useful if you saved configurations too often during the simulation and want to filter just a few of them.
Receives a text file contaning one dihedral angle per line (generated from calculate_dihedrals.py) and the .ien and .e12 from DICE. Two plots are generated: one that associates each dihedral angle to an intra molecular energy (U_{intra}) and solute solvent energy (U_{xs}), plotting the spread of the values as a scatter plot; and a second plot where the U_{intra} and U_{xs} are binned and then averaged (for a range of dihedral angles some configurations exist, the energy of these configurations are averaged), plotting as error bars the standard deviation of each of these averages. A fourth optional argument of the script is the number of bins used to the second plot (default = 36, meaning each bin is 10 degrees wide).
Most of the scripts here were written by Henrique Musseli Cezar, with the exception of DiceWin which was written by Thiago de Souza Duarte and Emanuel Fernandes Dias Mancio and pdb2xyz also written by Emanuel Mancio. These tools were written with the important contribution of Prof. Kaline Coutinho, who supervised the work and gave suggestions to the improvement of the tools.
We thank the Brazilian funding agencies CNPq, CAPES and FAPESP for the fellowships and the approved research projects. Most of this work was done under a CNPq PhD fellowship for Henrique Musseli Cezar (grant number 140489/2015-0) and undergraduate research fellowships for Thiago de Souza Duarte. And a FAPESP undergratuate fellowship for Emanuel Fernandes Dias Mancio.