Skip to content

Commit

Permalink
add info about missing values
Browse files Browse the repository at this point in the history
  • Loading branch information
gcroci2 committed Mar 18, 2024
1 parent 158cf87 commit 13b9f0a
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions docs/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,9 @@ For residue graphs, the pairwise sum of potentials for all atoms from each resid

Nonbond energies are set to 0 for any atom pairs (on the same chain) that are within a cutoff radius of 3.6 Å, as these are assumed to be covalent neighbors or linked by no more than 2 covalent bonds (i.e. 1-3 pairs).

Charge or vanderwaals parameters are set to 0 for those atoms that are unknown to the OPLS forcefield.

- `electrostatic`: Electrostatic potential (also known as Coulomb potential) between two nodes, calculated using interatomic distances and charges of each atom (float).
- `vanderwaals`: Van der Waals potential (also known as Lennard-Jones potential) between two nodes, calculated using interatomic distance/s and a list of atoms with vanderwaals parameters (`deeprank2.domain.forcefield.protein-allhdg5-4_new`, float). Atom pairs within a cutoff radius of 4.2 Å (but above 3.6 Å) are assumed to be separated by separated by exactly 2 covalent bonds (i.e. 1-4 pairs) and use a set of lower energy parameters.

Charge and vanderwaals parameters are set to 0 for those atoms that are unknown to the OPLS forcefield, treating such cases as missing values. If this happens for many of the atoms in the PDB file/s provided, depending on the specific dataset it may be worth it to drop the features affected, i.e., `electrostatic`, `vanderwaals`, and `atom_charge`.

- It may be useful to generate histograms of the processed data to further investigate the distribution of these features' values before deciding whether to drop them. Refer to the `data_generation_xxx.ipynb` tutorial files for comprehensive instructions on transforming the data into a Pandas dataframe and generating histograms of the features.

0 comments on commit 13b9f0a

Please sign in to comment.