Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for the command-line scripts #228

Closed
wants to merge 13 commits into from
  •  
  •  
  •  
97 changes: 86 additions & 11 deletions docs/source/cli_export_result.rst
Original file line number Diff line number Diff line change
@@ -1,24 +1,57 @@
mk_export.py
============

A command-line script to export docking poses of ligand (and flexible residues) to an SD file (.sdf), and to export the full receptor structures with updated conformations of flexible residues to a PDB file. Currently supports PDBQT files from AutoDock-Vina and DLG files from AutoDock-GPU that have the REMARK lines containing Smiles and index mapping information.

Basic usage
-----------

.. code-block:: bash

mk_export.py vina_results.pdbqt -s vina_results.sdf
mk_export.py autodock-gpu_results.dlg -s autodock-gpu_results.sdf


Example: Write all poses from DLG file to output SDF
~~~~~~~~

.. code-block:: bash

dock_dlg="Meeko/example/cli_export_result/3kgd_AMP_adgpu_out.dlg"
mk_export.py $dock_dlg -s 3kgd_AMP_adgpu_out.sdf --all_dlg_poses

Example: Write with the flexible sidechain to output SDF
~~~~~~~~

.. code-block:: bash

dock_pdbqt="Meeko/example/cli_export_result/1fpu_PRC_vina_out.pdbqt"
mk_export.py $dock_pdbqt -s vina_results.sdf -k

Example: Export full receptor with updated sidechains to output PDB
~~~~~~~~

.. code-block:: bash

dock_pdbqt="Meeko/example/cli_export_result/1fpu_PRC_vina_out.pdbqt"
rec_json="Meeko/example/cli_export_result/1fpu_receptorFH.json"
mk_export.py $dock_pdbqt -j $rec_json -p vina_results.pdb

About
-----

Convert docking results to SDF
------------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

AutoDock-GPU and Vina write docking results in the PDBQT format. The DLG output
from AutoDock-GPU contains docked poses in PDBQT blocks, plus additional information.
Meeko generates RDKit molecules from PDBQT using the SMILES
string in the REMARK lines. The REMARK lines also have the mapping of atom indices
between SMILES and PDBQT. SD files with docked coordinates are written
from RDKit molecules.

.. code-block:: bash

mk_export.py molecule.pdbqt -o molecule.sdf
mk_export.py vina_results.pdbqt -o vina_results.sdf
mk_export.py autodock-gpu_results.dlg -o autodock-gpu_results.sdf
from RDKit molecules.

Why this matters
----------------
~~~~~~~~~~~~~~~~

Making RDKit molecules from SMILES is safer than guessing bond orders
from the coordinates, specially because the PDBQT lacks hydrogens bonded
Expand All @@ -31,13 +64,55 @@ but because this is a nearly impossible task.
obabel -:"C1C=CCO1" -o pdbqt --gen3d | obabel -i pdbqt -o smi
[C]1=[C][C]=[C]O1


Caveats
-------
~~~~~~~

If docking does not use explicit Hs, which it often does not, the
exported positions of hydrogens are calculated from RDKit. This can
be annoying if a careful forcefield minimization is employed before
docking, as probably rigorous Hs positions will be replaced by the
RDKit geometry rules, which are empirical and much simpler than most
force fields.

Options
-------

Positional Argument (Input)
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. option:: docking_results_filename

One or more docking output files in either PDBQT format (from Vina) or DLG format (from AD-GPU).

Output Options
~~~~~~~~~~~~~~

.. option:: --suffix <suffix>

Set a suffix for output filenames that are not explicitly specified. The default suffix is ``_docked``.

.. option:: -s, --write_sdf <output_SDF_filename>

Specify the output SDF filename. Defaults to the input filename with a suffix defined by ``--suffix``.

.. option:: -j, --read_json <input_JSON_filename>

Provide a receptor file generated by ``mk_prepare_receptor`` with the ``-j/--write_json`` option. Currently only effective when used with ``-p, --write_pdb``.

.. option:: -p, --write_pdb <output_PDB_filename>

Specify the output PDB filename. Defaults to the input filename with a suffix defined by ``--suffix``. Must be used together with ``-j, --read_json``.

.. option:: --all_dlg_poses

(Flag) Write all poses from AutoDock-GPU DLG output files, instead of only the lead of each cluster. Currently only effective for ``-s, --write_sdf``.

.. option:: -k, --keep_flexres_sdf

(Flag) Include flexible residues, if any, in the SDF output.

.. option:: -, --redirect_stdout

(Flag) Instead of writing an SDF file, print it directly to the standard output (STDOUT).


185 changes: 179 additions & 6 deletions docs/source/cli_lig_prep.rst
Original file line number Diff line number Diff line change
@@ -1,16 +1,189 @@
mk_prepare_ligand.py
====================

Command line tool to prepare small organic molecules.
A command-line script for ligand preparation that generates the ligand PDBQT file(s). Currently supports SD files (.sdf), Mol2 files (.mol2) and Mol files (.mol), but SDF is strongly preferred as input files.

Write PDBQT files
-----------------

AutoDock-GPU and Vina read molecules in the PDBQT format. These can be prepared
by Meeko from SD files, or from Mol2 files, but SDF is strongly preferred.
Basic usage
-----------

.. code-block:: bash

# write a single ligand PDBQT file from a single-molecule input file
mk_prepare_ligand.py -i molecule.sdf -o molecule.pdbqt

# prepare ligand PDBQT files in batch from a multiple-molecule input file
mk_prepare_ligand.py -i multi_mol.sdf --multimol_outdir folder_for_pdbqt_files

Example: Prepare ligand from a multi-molecule SDF, output PDBQT in (multiple) tar.gz of a certain size
~~~~~~~~

.. code-block:: bash

lig_sdf="Meeko/example/tutorial1/mols.sdf"
mk_prepare_ligand.py -i $lig_sdf --multimol_prefix mol_batch -z --multimol_targz_size 100

Example: Prepare ligand with the macrocycle-typing option turned off
~~~~~~~~
.. code-block:: bash

lig_sdf="Meeko/example/cli_lig_prep/Rifamycin_PubChem.sdf"
mk_prepare_ligand.py -i $lig_sdf -o Rifamycin_rigidmacro.pdbqt --rigid_macrocycles

# current default allows flexible macrocycles
mk_prepare_ligand.py -i $lig_sdf -o Rifamycin_flexmacro.pdbqt

Example: Prepare ligand with the espaloma charge model
~~~~~~~~

Python module espaloma and its dependencies are required. Visit `the official documentation for installation guide <https://espaloma.wangyq.net/install.html>`_.

.. code-block:: bash

mk_prepare_ligand.py -i $lig_sdf -o AMP_espaloma.pdbqt --charge_model espaloma

# current default charge model is gasteiger
mk_prepare_ligand.py -i $lig_sdf -o AMP_gasteiger.pdbqt


Options
-------

Input/Output Options
~~~~~~~~~~~~~~~~~~~~

.. option:: -i, --mol <input_molecule_filename>

The input molecule file, in formats such as MOL2, SDF, etc. This option is required.

.. option:: --name_from_prop <property_name>

Set the molecule name using a specified RDKit or SDF property.

.. option:: -o, --out <output_pdbqt_filename>

Specify the output PDBQT filename. Only compatible with single-molecule input.

.. option:: --multimol_outdir <output_directory>

Specify the directory to write PDBQT output files for multi-molecule inputs. Incompatible with `-o/--out` and `-`/`--`.

.. option:: --multimol_prefix <prefix>

Replace the internal molecule name in multi-molecule input with the specified prefix. Incompatible with `-o/--out` and `-`/`--`.

.. option:: -z, --multimol_targz

(Flag) Compress output files into a `.tar.gz` archive.

.. option:: --multimol_targz_size <size>

Define the number of PDBQT files per `.tar.gz` archive. Default is 10000. Only effective when used with `-z, --multimol_targz`.

.. option:: -, --

(Flag) Redirect output to standard output (STDOUT) instead of writing a file. Ignored if `-o/--out` is specified. Only compatible with single-molecule input.

Molecule Preparation Options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. option:: -c, --config_file <config_file>

Configure `MoleculePreparation` from a JSON file. Command-line arguments will override settings in the file.

.. option:: --rigid_macrocycles

(Flag) Keep macrocycles rigid in their input conformation.

.. option:: --macrocycle_allow_A

(Flag) Allow bond break with atom type A, which will be retyped as carbon (C).

.. option:: --keep_chorded_rings

(Flag) Retain all rings from exhaustive ring perception.

.. option:: --keep_equivalent_rings

(Flag) Retain rings with equivalent sizes and neighboring atoms.

.. option:: --min_ring_size <size>

Define the minimum number of atoms required in a ring for it to be considered for opening.

.. option:: -w, --hydrate

(Flag) Add water molecules to the structure for hydrated docking.

.. option:: --merge_these_atom_types <types> [*]

Specify a list of atom types to merge. The default is `"H"`.

.. option:: -r, --rigidify_bonds_smarts <SMARTS>

Provide SMARTS patterns to rigidify specific bonds in the molecule.

.. option:: -b, --rigidify_bonds_indices <i j>

Specify the indices of two atoms that define a bond in the SMARTS pattern (starting from 1).

.. option:: -a, --flexible_amides

(Flag) Allow amide bonds to rotate, making them non-planar (not recommended).

.. option:: -p, --atom_type_smarts <JSON_FILENAME>

Specify SMARTS-based atom typing in JSON format.

.. option:: -aa, --add_atom_types <JSON>

Specify additional atom types to assign in JSON format, with SMARTS patterns and atom type names.

.. option:: --double_bond_penalty <penalty>

Set a penalty value; values greater than 100 prevent breaking double bonds.

.. option:: --charge_model <model>

Choose the charge model: `gasteiger`, `espaloma`, or `zero`. Default is `gasteiger`; `zero` sets all charges to zero.

.. option:: --bad_charge_ok

(Flag) Allow NaN and Inf charges in the PDBQT output.

.. option:: --add_index_map

(Flag) Include a map of atom indices from the input to the PDBQT file.

.. option:: --remove_smiles

(Flag) Exclude SMILES from being written as a remark in the PDBQT output.

Reactive Docking Options
~~~~~~~~~~~~~~~~~~~~~~~~

.. option:: --reactive_smarts <SMARTS>

Provide a SMARTS pattern for defining the reactive group.

.. option:: --reactive_smarts_idx <index>

Specify the 1-based index of the reactive atom within the SMARTS pattern provided by `--reactive_smarts`.

Covalent Docking (Tethered) Options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. option:: --receptor <filename>

Specify the receptor file. Supported formats depend on ProDy availability, such as `.pdb` and `.mmcif`.

.. option:: --rec_residue <residue>

Specify the residue in the receptor for attachment, e.g., `A:LYS:204`.

.. option:: --tether_smarts <SMARTS>

Provide a SMARTS pattern defining the ligand atoms used for attachment to the receptor.

.. option:: --tether_smarts_indices <IDX IDX>

Specify the 1-based indices of the two atoms in the SMARTS pattern that will be attached (default: `1 2`).
Loading