Rigid and Flexible Ligand DOCKing

Author: P. Therese Lang
Last updated August 16, 2024 by Scott Brozell

This tutorial describes rigid- and flexible-ligand DOCKing to a rigid receptor with grid-based scoring. We study the complex L-Arabinose-Binding Protein bound to L-Arabinose (PDB ID 1ABE ) as an example system. However, these techniques should be transferable to any protein-ligand system. The perspective is docking to a known binding site as opposed to blind docking.

To start this tutorial, obtain the lig_charged.mol2 text file from the "Structure Preparation Tutorial," the selected_spheres.sph text file from the "Sphere Generation and Selection Tutorial," and the grid.nrg and grid.bmp binary files from the "Grid Generation Tutorial." The program dock6 that is distributed and installed with DOCK is required. This ligand input file contains a single ligand, but multiple ligands would be treated simply by including more ligands in the ligand input file.

OPTION 1: Rigid Ligand DOCKing.

In this first case, the ligand will be kept completely rigid during the orientation step. The purpose is to explore the matching and minimization algorithms. However, this type of docking could be applied in a scientific setting if you have a library of ligands that have already been conformationally expanded outside of the DOCK suite of programs.

The rigid body orienting code is written as a direct implementation of the isomorphous subgraph matching method of Crippen and Kuhl (Kuhl, et al. J. Comput. Chem. 1984). Conceptually, the algorithm matchings the centers of the ligand heavy atom to the centers of the receptor site spheres. The algorithm follows the steps below:

1) Generate node
2) Label as match if atom and sphere edges are equivalent
3) Extend match by adding more nodes
4) Exhaustively generate set of non-degenerate matches
5) Use matches to create transformation matrices to move the entire molecule

node = pairing of one heavy atom and one sphere center

edge length = Euclidean distance between atom or sphere centers

 

Once an orientation has been generated, the interaction between the ligand and the receptor can be energetically optimized, in this case using a simplex minimizer (Nelder, et al. Computer Journal 1965). During minimization, the ligand is allowed to be flexible, but the receptor remains rigid. The final score in the output file is the best pose generated from the orienting and minimization procedure.

To actually run the docking calculation, you need to use the program dock6 that is distributed with DOCK in the /bin directory. You need to generate an input file--rigid.in in this case--either interactively by answering questions or by hand in a text file.

USAGE: dock6 -i input_file [-o output_file] [-v]

OPTIONS:
-i input_file #Input parameters extracted from input_file, or dock.in if not specified
-o output_file #Output written to output_file, or dock.out if not specified
-v #Increased output verbosity

DOCK may be executed in either interactive or batch mode, depending on whether output is written to a file. In interactive mode, the user is requested only for parameters relevant to the particular run and default values are provided. This mode is recommended for the initial construction of the input file and for short calculations. In batch mode, input parameters are read in from the input file and all output is written to the output file. This mode is recommended for long calculations once an input file has been generated interactively. dock.in and dock.out are the default input/output names, but you may specify others. For more information on various ways to run the dock6 executable, see the DOCK 6 manual.

Below are some typical parameters that may be declared for this type of grid scoring calculation.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

NOTE: You should specify the full paths for the ligand file, sphere file, the grid file, and vdw and conformational search definition files. A more detailed explanation of the scoring functions and the input parameters can be found in the DOCK 6 Manual.

The program will generate an output file, rigid.out. It lists the parameters utilized in the run, any warning or error messages, and summary information about the best scoring pose. DOCK will also produce a structure file, rigid_scored.mol2, containing the geometric coordinates of the best pose as well as a summary of the interaction energy of that pose. These files are human readable and should be examined carefully. UCSF's Chimera has a ViewDock facility for visualizing DOCK output. See also the ViewDock Tutorial.

OPTION 2: Flexible Ligand DOCKing.

In this second case, the ligand will be allowed to be flexible. This type of docking allows the ligand to structurally rearrange in response to the receptor.

First, the largest rigid substructure of the ligand (the anchor) is identified (see the first bold arrow in the figure below). All bonds within molecular rings are treated as rigid. This classification scheme is a first-order approximation of molecular flexibility, since some amount of flexibility can exist in non-aromatic rings. To treat such phenomena as sugar puckering and chair-boat hexane conformations, the user needs to supply each ring conformation as a separate input molecule. If the molecule does not have a ring, the largest rigid segment is specified as the anchor. Additional bonds may be specified as rigid by the user.

Next the flexible layers of the ligand are identified (see the first bold arrow in the figure below). Each flexible bond is associated with a label defined in an editable file. The parameter file is identified with the flex_definition_file parameter. Each label in the file contains a definition based on the atom types and chemical environment of the bonded atoms. Typically, bonds with some degree of double bond character are excluded from minimization so that planarity is preserved. Each label is also associated with a set of preferred torsion positions. The location of each flexible bond is used to partition the molecule into rigid segments. A segment is the largest local set of atoms that contains only non-flexible bonds.

Cartoon of Anchor-and-Grow Algorithm

In the second stage of the Anchor-and-Grow Algorithm the anchor is rigidly oriented in the active site (see the second bold arrow in the figure above) using the same method described in Option 1. The anchor orientations are evaluated and optimized using the scoring function and the simplex minimizer, also described in Option 1. The orientations are then ranked according to their score, spatially clustered by heavy atom root mean squared deviation (RMSD), and prioritized (pruning).

Finally, in the growth stage (see the third through the sixth bold arrows in the figure above) the flexible layers of the ligand are built onto the best anchor orientations within the context of the receptor. It is assumed that the shape of the binding site will help restrict the sampling of ligand conformations to those that are most relevant for the receptor geometry (see the red X in the figure above).

To actually run the docking calculation, you need to use the program dock6 that is distributed with DOCK in the /bin directory. You need to generate an input file--anchor_and_grow.in in this case--either interactively by answering questions or by hand in a text file.

dock6 -i dock.in [-o dock.out] [-v]

As described in Option 1, DOCK may be executed in either interactive or batch mode, depending on whether output is written to a file.

Below are some typical parameters that may be declared for this type of grid scoring calculation.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):
#description

In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.

NOTE: You should specify the full paths for the ligand file, sphere file, the grid file, and vdw and conformational search definition files. A more detailed explanation of the scoring functions and the input parameters can be found in the DOCK 6 Manual.

The program will generate an output file, anchor_and_grow.out. It lists the parameters utilized in the run, any warning or error messages, and summary information about the best scoring pose. DOCK will also produce a structure file, anchor_and_grow_scored.mol2, containing the geometric coordinates of the best pose as well as a summary of the interaction energy of that pose. These files are human readable and should be examined carefully. See also the previous comments on Chimera ViewDock.