bonded atoms. Given four consecutive atoms A i − 2 , A i − 1 , A i , and A i + 1 , the dihedral angle is defined as the smallest angle between the planes π 1 and π 2 , as shown in the figure. Variation of
the dihedral angle is a consequence of rotation of the two outer bonds about the central bond.
Figure 24. A Dihedral Angle
π 1 is the plane uniquely defined by the first three atoms A i − 2 , A i − 1 , and A i . Similarly, π 2 is the plane uniquely defined by the last three atoms A i − 1 , and A i , and A i + 1 . The dihedral angle, θ, is defined as the smallest angle between these two planes. You can read more about the angle between two intersecting planes here.
In this module, because bond lengths and bond angles are being ignored as underlying degrees of
freedom of a protein, the only remaining degrees of freedom are the dihedral rotations.
Representing protein conformations with the dihedral angles as the only underlying degrees of
freedom is known as the idealized or rigid geometry model. Ignoring bond lengths and bond
angles greatly reduces the number of degrees of freedom and therefore the computational
complexity of representing and manipulating protein structures. Even more efficient
representations which reduce the number of degrees of freedom even further exist [link], but these are beyond the scope of this introduction.
Dihedral Representation of Protein Conformations
All amino acids share the same core of one nitrogen, two carbon, and one oxygen atoms. This
shared core makes up the backbone of the protein. There are two freely rotatable backbone
dihedral angles per amino acid residue in the protein chain: the first, designated φ, is a
consequence of the rotation about the bond between N and Cα , and the other, ψ, which is a
consequence of the rotation about the bond between Cα and C . The peptide bond between C of one
residue and N of the adjacent residue is not rotatable.
The number of backbone dihedrals per amino acid is 2, but the number of side chain dihedrals
varies with the length of the side chain. Its value ranges from 0, in the case of glycine, which has
no sidechain, to 5 in the case of arginine.
Figure 25. Dihedral Angles in Arginine
The backbone atoms appear at the bottom of the illustration (the peptide bond is not rotatable). The sidechain dihedrals are conventionally designated by χ and a subscript.
One can generate different three dimensional structures of the same protein by varying the
dihedral angles. There are 2N backbone dihedral DOFs for a protein with N amino acids, and up to
4N side chain dihedrals that one can vary to generate new protein conformations. Changes in
backbone dihedral angles generally have a greater effect on the overall shape of the protein than
changes in side chain dihedral angles. Think about why.
Protein Forward Kinematics
Kinematics is a branch of mechanics concerned with how objects move in the absence of mass
(inertia) and forces. You can imagine that varying the dihedral angles will move a protein's atoms
relative to each other in space. The problem of computing the new spatial locations of the atoms
given a set of dihedral rotations is known as the forward kinematics problem.
The importance of this problem to protein modeling and simulation should be clear: as stated
earlier, the only internal degrees of freedom usually considered for a protein are its dihedral
angles. Thus, moving a protein will be achieved by setting some of its dihedral angles to new
values. For some applications, such as the rendering of an image of the protein and the
computation of its Energy, however, the Cartesian (x,y,z) coordinates for each atom are needed.
These are obtained by forward kinematics.
Mathematical Background: Matrices and Transformations
The math involved in solving forward kinematics requires some background in linear algebra,
specifically in the anatomy and application of transformation matrices. The links provided in this
section should provide enough mathematical background to understand the rest of this module and
eventually write a simple protein manipulation program.
Background on Transformations
Transformation Matrices: The main transformations you will apply to polypeptide chains will
be a combination of translations and rotations. Please see introduction to translations and
introduction to two- and three-dimensional rotations. One special rotation matrix is the
Euler matrix . Please pay particular attention to the different conventions used for defining the Euler matrix. The one adopted for this module is the XYZ convention (there is also the ZXZ
convention). Now that you know what an Euler matrix looks like, you need to get familiar with
rotations about an arbitrary vector or line. Please read more on rotations around an arbitrary
Homogeneous Transformations: The use of homogenous coordinates and transformations can
simplify some of the calculations involved in using three-dimensional transformations. In
particular, they allow translation, which is not a linear operator in 3D, to become a linear
operator in the 3D subspace (x,y,z,1) of a 4D space. The advantage of this representation is that
translation becomes achievable by multiplying a vector by a matrix, and so becomes
composable. A direct benefit from this is the ability to express, as a matrix, a rotation around
an arbitrary point, not just the origin as in the pure 3D case. See homogenous
Quaternions Quaternions are an efficient, robust method of representing three-dimensional
rotations. In particular, they are not subject to the undesirable singularities and numerical
instability of rotations represented by orthonormal matrices and Euler angles. Please visit this
introduction to quaternions to see how they relate to homogenous transformations. In this class quaternions will be used for the optimal structural alignment of two proteins and it is
recommended that the reader familiarizes him/herself with the concept of quaternions as soon
as possible.
A more detailed discussion of spatial descriptions and transformations can be found in chapter 2
of [link]. The most widely used transformations to manipulate protein chains are rotations.
Several representations are possible for rotations:
Euler angles: The orientation of an object is given as three rotations about set axes. For
example, in the ZXZ convention, the angles specify a rotation about the global z-axis, followed
by one about the global x-axis, and finally, one more about the global z-axis. The use of Euler
angles is subject to an undesirable phenomenon called gimbal lock, in which two of the
rotational axes become aligned in such a way that a degree of freedom is lost.
Cardan angles: The orientation is specified as a set of three rotations about axes defined by the
object. The typical example is the pitch-roll-yaw set of rotations for an aircraft. Pitch
corresponds to a rotation about the axis from wingtip to wingtip. Roll corresponds to a rotation
about an axis from the nose to the tail, and yaw corresponds to rotation about a third "vertical"
axis through the center of the plane, and roughly corresponds to a notion of horizontal heading.
This method is also subject to gimbal lock.
Axis-angle representation: It can be proven that any three-dimensional rotation can be
represented as a single rotation about an axis, represented by a unit vector.
Rotation matrices: A rotation matrix is an orthonormal matrix that represents a rotation.
Rotation matrices are discussed later in the module. Applying the matrix to a vector yields the
rotated vector. Given two rotations represented by matrices A and B, the result of applying both
rotations in sequences is given by the matrix product AB.
Unit quaternions: A rotation of angle theta about the axis represented by the unit vector v =
[x, y, z] is represented by a unit quaternion. Quaternions are described in this module.
Forward Kinematics
As stated earlier, a common operation when manipulating proteins in silico is to retrieve the
Cartesian coordinates of each atom in the protein from our knowledge of its dihedral angles and
rotations applied to them. For simplicity, assume we have an anchor atom and we are modeling
the protein backbone only, that is, the protein consists of a serial linkage composed of consecutive
backbone atoms, as shown in Figure 5.
A Simple Approach
The simplest way to represent a protein chain is to store the Cartesian (x,y,z) coordinates of each
atom at all times. These coordinates are relative to some global coordinate frame which is
unimportant, for example that in which the atomic positions were obtained by X-Ray
crystallography and which are typically read from the PDB files. These coordinates can be
changed if so desired. Common changes are to remove the center of mass (thus centering the
protein at the global origin), subtract the position of the anchor atom (to center the protein at this
atom), etc.
But it was discussed earlier that the "natural" degrees of freedom for kinematic manipulations are
usually the dihedral angles alone. This means that algorithms that operate on dihedral angles to
achieve their goals will normally require a way to modify the Cartesian coordinates when dihedral
rotations are performed, to reflect the new atomic positions. This can be easily done with rotation
matrices as follows.
Figure 26.
A protein backbone as a serial linkage.
When a rotation of θ degrees around bond i is performed, one can think of all atom positions
starting at i+2 rotating around the axis defined by bond i, and all other atoms (from anchor to
atom i+1 inclusive) remaining stationary. Thus, upon such a rotation, the Cartesian coordinates of
the atoms after the bond need to be updated, and their new values are given by:
Where [x,y,z,1] is the position of a generic atom in homogeneous form, [x',y',z',1] is its position
after the rotation (T is the transpose operator), and R(i,θ) is a 4x4 matrix that encodes a rotation of
θ degrees around an axis coinciding with bond i that passes through atom ai, and is given in
homogeneous form as:
In the above formula, T(x) is a translation by the vector x and R0(axis,θ) is a rotation around an
axis that goes through the origin of the specified coordinate system. As can be seen, this rotation
around an arbitrary point is realized by translating the point to the origin, rotating the target atom
around the axis through the origin, and then translating it back (the composition of these 3
transformations yields a unique 4x4 homogeneous matrix that achieves the same effect). The axis
of rotation can easily be computed from the positions of atoms i and i+1 and must have unit norm.
To perform successive rotations about different bonds, this procedure can be repeated, updating
the Cartesian coordinates for each rotation. Note that the convention used for matrix-vector
multiplication is to multiply column vectors by matrices on the left, so the rightmost
transformation gets applied first, and so on. This is the convention used in most of the literature,
but the alternate convention is possible (multiplying row vectors with matrices on the right; these
matrices are the transpose of the column-vector convention).
Alternatively, if many rotations need to be performed at the same time (and the intermediate
Cartesian coordinates are not needed), these rotations could be sorted by bond number and applied
simultaneously, by noting that rotations can be performed in a cumulative way as the backbone is
traversed from anchor to end atom. The ability to chain rotations around arbitrary vectors in space
(i.e. not through the origin) is one of the main benefits of homogeneous transformations. For
example, if two rotations need to be applied at the same time, one around bond 3 by 30 degrees
and another around bond 7 by 15 degrees, the atoms between bonds 3 and 7 get updated by:
But the atoms after bond 7 are updated by:
In the above, bond n is the unit vector defined along bond n, easily computed by subtracting the
coordinates of atoms n+1 and n, and then dividing by its norm. The chaining of transformations as
explained above is very useful to achieve arbitrary rotations of bonds within a protein. Sections of
the protein (i.e. atoms belonging to certain residues) can be updated when a dihedral rotation is
performed simply by constructing the overall matrix that should affect them.
Denavit-Hartenberg Local Frames
The previous approach, while simple and intuitive, has some shortcomings:
The accumulation of math operations in the rotation matrices is prone to numerical instability.
After only a couple of hundred rotations of a point, each accumulating on the other, the final
position of the point may start differing significantly from its actual, intended position. As a
consequence, the relative position and orientation of atoms in the protein chain will no longer
be in agreement with the protein structure. In particular, bond lengths and angles will begin
stretching and deviating from their physically acceptable values.
The actual values of the Cartesian coordinates are always stored in a particular, arbitrarily
chosen frame of reference. For example, if we wanted to translate the protein, we would need to
modify the Cartesian coordinates stored.
Once a rotation is applied, the method "forgets" the current values of the dihedral angles, which
would need to be re-computed if needed. What is stored is a snapshot of the current Cartesian
coordinates of each atom.
The original definition of Forward Kinematics, however, is a method to obtain the Cartesian
coordinates of each atom from the current values of the internal degrees of freedom (dihedral
angles in our case) at any time. In such an approach, the Cartesian coordinates need not be
recomputed after every change in the dihedral angles; rather, the idea is to store the current values
of the dihedral angles, and to have a procedure to reconstruct the atomic positions when needed.
The advantages of this approach are:
A more compact representation of the variables of the problem, since the dihedral angles
require less space than the (x,y,z) coordinates of each atom (the protein topology requires the
values of the bond lengths and angles anyway, so the total amount of numbers to store is
comparable).
It is not prone to numerical instability since the number of rotations performed to position an
atom is always its sequence number in the chain. (Actually if the chain is thousands of residues
long, some uncertainty could arise in the position of atoms far along the chain, but the relative
position of consecutive atoms can still be kept under control, avoiding bond stretching).
Performing a dihedral rotation consists simply of adding/subtracting the rotation angle from the
stored value for each angle. In particular, simultaneous rotations (i.e. rotating more than one
dihedral angle at a time) which consists of multiplying many 4x4 matrices in the global
method, reduces to modifying the angle values.
There is no explicit global coordinate frame for the protein. It can be positioned arbitrarily by
prepending a position/orientation matrix to the forward kinematics computation.
The only preprocessing step that is necessary to start working with this method, however, is to
perform an initial pass on the protein to extract the initial values of the dihedral angles and the
constant bond lengths and angles, from the Cartesian coordinates available from PDB files, if the
intention is to start from the protein's native state. This is easily done; bond lengths can be
obtained by computing the distance between the bonded atoms, and bond angles by computing the
angle between the vectors formed by two consecutive bonds (recall that the dot product of two
vectors yields the product of their lengths times the cosine of the angle between them). Next, we
present the transformations required for the Denavit-Hartenberg method.
Consider three consecutive bonds as in the figure below. Suppose that a local coordinate frame is
attached at the beginning of each bond. For example, local coordinate system x i − 1 , y i − 1 , z i − 1 is centered at atom A i − 1 . Therefore, imagine that the position of each atom in three-dimensional
space is specified in terms of a frame that is centered at the previous atom. Given the frames at
atom A i − 2 , and atom A i − 1 , one can determine how the frames at atoms A i and atom A i − 1 will change in space as a consequence of a rotation around the bond that connects atoms A i − 1 and
A i with the dihedral angle [link]. The correct transformation can be computed in terms of three primitive operations: two rotations and one translation. The two rotations are a rotation around the
dihedral bond by the dihedral angle and a rotation around an axis perpendicular to the bond angle,
by the bond angle. The translation refers to the fact that the origins of the frames are on the
respective centers of the atoms connected by the bond, thus separated by bond lengths.
Figure 27. The Denavit-Hartenberg convention
To describe the position of atom i in terms of the coordinate frame centered at atom i-1, two rotations and a translation are composed.
The order in which to compose these 3 transformations, to obtain the total transformation that
expresses the position of atom i in terms of frame i-1, is the following:
where the rotation axes are the usual x (1,0,0) and z (0,0,1),
not to be confused with the DH Local Frames. The resulting homogenous transformation is shown
below.
Figure 28. Transformation
Homogeneous transformation to express the coordinates of atom i in terms of the frame centered at i-1
Note that θ i is the dihedral angle on bond b i and α i − 1 is the bond angle between bonds b i − 1 and b i . d i is the length of bond b i . For a more detailed derivation of this transformation, please read the included material in required readings. The position of any atom in the molecule can be
determined by chaining matrices of the form given above. For example, suppose that b i , b i − 1 ,
..., b 1 , represents the sequence of bonds on the path from a particular atom a to the anchor atom
a anch . Then, for atom a, its Cartesian coordinates with respect to the frame attached to the anchor
atom is given by:
Figure 29. Equation 1
The coordinates of atom a with respect to the local frame attached to it are 0, 0, 0.
To complete the description, one can allow for rotations or translations of the local frame attached
to the anchor atom with respect to some global frame. Rotations of the anchor atom with respect
to a global frame cause a rigid rotation of the entire polypeptide chain. To do so, one can define
the rotation frame as the Euler matrix defined by the Euler angles of the local frame of the anchor
atom to the global frame. As discussed before, there are many conventions to define the Euler
matrix. One of them, the X-Y-Z convention, defines the Euler matrix as the product of three
rotation matrices: rotation around the z axis by angle α; rotation around the y axis by angle β;
rotation around the x axis by the angle γ. The order of performing these three rotations in the X-Y-
Z conventions is: rotation around x axis first, then around y axis second, and around z axis last.
The resulting Euler matrix according to this convention is given below:
Figure 30. Euler Matrix
α, β, γ are the so-called Euler angles - the angles with respect to each of the Cartesian axes. The convention used here is the XYZ
convention. cα and sα denote cos( α) and sin( α) respectively.
The Euler matrix can be applied last to the accumulating dihedral rotations in order to allow the
anchor atom to move with respect to a global frame. For a more detailed explanation, please read
the included material in required readings.
Required Reading
Zhang-Kavraki 2002 [PDF] Zhang, M. and L. E. Kavraki, "A New Method for Fast and Accurate Derivation of Molecular Conformations". Journal of Chemical Information and
Computer Sciences, 42:64-70, 2002.
Stamati-Shehu-Kavraki. Computing Forward Kinematics for Protein-like linear systems using
Denavit-Hartenberg Local Frames [PDF]
Resources
1. VMD Visual Molecular Dynamics, is an excellent tool for visualization and scripted
manipulation of protein structures that uses Tcl scripting - Humphrey, W., Dalke, A. and
Schulten, K., "VMD - Visual Molecular Dynamics", J. Molec. Graphics, 1996, vol. 14, pp. 33-
38.
2. RasMol is mostly a viewer, but has some built-in tools. - Roger Sayle and E. James Milner-
White. "RasMol: Biomolecular graphics for all", Trends in Biochemical Sciences (TIBS),
September 1995, Vol. 20, No. 9, p. 374.
3. Chimera is a very powerful visualizer that handles huge structures easily. - Pettersen, E.F.,
Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., and Ferrin, T.E.
"UCSF Chimera - A Visualization System for Exploratory Research and Analysis." J. Comput.
Chem. 25(13):1605-1612 (2004).
4. InsightII, Cerius2 and Catalyst are products for simulation, discovery and analysis that
recently became commercial, and can be found here.
5. CHARMM is a simulation package based on the CHARMM force field. Brooks BR,
Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983). "CHARMM: A
program for macromolecular energy, minimization, and dynamics calculations". J Comp Chem
4: 187217.
6. NAMD is another popular simulation package and can be obtained here. James C. Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe
Chipot, Robert D. Skeel, Laxmikant Kale, and Klaus Schulten. Scalable molecular dynamics
with NAMD. Journal of Computational Chemistry, 26:1781-1802, 2005.
7. Amber is one of the most widel