Geometric Methods in Structural Computational Biology by Lydia Kavraki - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

where the last column contains 1. You can do so with the command backbone_chain =

zeros (N, 4); which creates a matrix with N rows and 4 columns initialized to all zeros.

You can set the fourth column to 1 with the command backbone_chain(:, 4) = 1; Now

you need to place the cartesian coordinates from the array to this matrix. You can do so with the

for loop below:

for i = 1:N

for j = 1:3

backbone_chain(i,j) = cartesian_coordinates((i-1) * 3 + j);

end;

end;

The cartesian coordinates you have just read will serve as a basis for performing rotations,

according to what method you choose to represent and manipulate the protein, as discussed in

class. The following guidelines should help you get started with either method:

If using the global coordinate (simple) approach, you can use the cartesian coordinates just read

as a basis to perform rotations on them. For the most interesting task of transforming atomic

positions, you simply need to iterate over the atoms affected by the transformation and

multiply their position with the transformation matrix. Say the position of atom i needs to be

transformed by your transformation matrix. You can do so by multiplying trans_mat with

backbone_chain(i, :) as in trans_mat * backbone_chain(i, :)'. Note that

the colon is very convenient as it gives an entire row or an entire column and that

backbone_chain(i, :)' gives you the transpose that is necessary for the multiplication

to be carried out.

If using the Denavit-Hartenberg local frames, you first need to extract the bond lengths and

angles, and the initial dihedral angles, from the cartesian coordinates. To do so, you can loop

over the atoms computing these values and storing them (for example in arrays named bonds,

angles, dihedrals). Once this is done, you have extracted a representation of the protein

in its internal coordinates and can discard the cartesian coordinates, or keep the coordinates

array to store the reconstructed protein after performing manipulations. The absolute

position/orientation of the protein is not important and can be ignored by assuming the anchor

atom rests at the origin and that the first bond angle and dihedral angle are both zero.

Remember you can perform rotations now by simply adding/subtracting from the dihedral

angles, but to recover the cartesian coordinates once the rotations are done you need to build a

chain of homogeneous transformations as discussed in class.

In any case, your transformation matrices need to have dimension 4x4 to work with homogeneous

coordinates. You can initialize an empty 4x4 matrix by writing trans_mat = zeros(4,4);.

Then you can set the elements of this matrix to what they should be for the particular method you

use. You can evaluate all the cosines and sines that you need in Matlab. Matlab offers you built-in

operations such as dot product or vector norm. Make sure that you understand these operations

before you apply them. Recall that you can rotate any bond except the peptide bond, which is

rigid.

Conduct the following rotations:

Rotate the dihedral bond between the N and CA of the last aminoacid by 30 degrees. Does the

structure change much? What atoms move in space as the result of this rotation? Please answer

Q1 .

Now choose any dihedral bond roughly in the middle of the protein and experiment with its

rotation until you reach a conformation that has steric clashes (atoms collide with one another).

Please answer Q3 and Q4 .

If you wish to visualize the resulting conformations in VMD, you simply have to produce a file

with the transformed coordinates. This file should be in ASCII (text) format and must have an

empty/dummy first line, and all atom coordinates following (for example all in a single line,

index-170_1.jpg

separated by spaces). You can save this file with ".crd" extension. Then, you can use the same

"backbone_native.pdb" file, and use VMD's option "Load data into molecule" to load your new

coordinates from the .crd file.

Ranking by Energy

Protein conformations are not only geometric objects but are characterized by energetic stability.

You have seen that there are many empirical ways to compute the potential energy of a

conformation. Functions such as CHARMM are very involved for the purpose of this assignment.

Therefore, let us not worry about all the energetic terms to be considered from the atomic

interactions. Consider only unfavorable interactions due to collisions between atoms, also referred

to as steric clashes. Think about a simple energy function that checks for steric clashes. Your

function should report high energies for conformations with collisions and low energies for

collision-free conformations. You can model each atom as a sphere with a certain radius known as

the Van der Waals (VDW) radius. Even though different atoms have different VDW radii, you can

assume that all atoms have the same radius of 1.7 A. To assist you in devising a function that

checks for collisions over all pairs of atoms, consider the following:

Figure 1.6.

For all different atoms i and j, make sure that the distance between their centers C is more than twice their radiuses. In this simple example we are using the same radius R.

Evaluating Rotations by Energy

Now think about the effect of the dihedral rotation of the last bond the energetic stability of the

new conformation. Please answer Q2 .

When you rotated a dihedral bond in the middle of the backbone until you reached a

conformation that had collisions, you verified the presence of steric clashes by visualizing the

conformation. You can quantify the presence of steric clashes now through your energy

function. Please answer Q5 .

Again, as in dihedral rotations, you are encouraged to use Matlab. After you have the cartesian

coordinates of the manipulated conformation, you can iterate over the atoms and check whether

they are in collision with one another. You can do this with a double loop. Assume that the radius

of every atom is the same, 1.7 A. In order to avoid reporting collisions for bonded atoms, you can

consider only pairs of atoms that are 4 positions apart. That is, check atom i with atom i+4 for a

possible steric clash.

For Submission

NOTE: remember that in order to open a .crd file in VMD (to attach it to a loaded structure) it has

to contain a dummy first line. If your implementation writes out ASCII coordinates, remember to

add an empty/dummy line at the beginning before opening with VMD. Please follow this list of

deliverables closely:

Q1 : Superimpose the conformation resulting from rotating the dihedral bond between the N and

CA of the last aminoacid by 30 degrees over the native backbone conformation. Prepare an image

that clearly shows which atoms' locations have changed. Submit this image.

Q2 : Now you need to quantify the energetic cost of rotating this dihedral bond by 30 degrees by

reporting whether this rotation causes any steric clashes. Please test your energy function on the

conformation you obtained from rotating the last dihedral bond by 30 degrees. Please report the

answer of your energy function. It should be compatible with what you can visualize from the

image you prepared above.

Q3 : Provide the index of the dihedral bond you need to rotate to generate a large scale motion

that results on collisions between atoms. Report on the amount of rotation that you need to

perform in order to obtain a conformation that contains steric clashes. Plot this conformation

superimposed on the native backbone conformation so as to show that it is very different from the

native. Render this image and submit it.

Q4 : Zoom in on that part of your conformation that contains collisions. Make sure that this

image clearly shows steric clashes and submit this image.

Q5 : Compute the energy of the above conformation with your energy function. Report the

answer that your energy function returns. Make sure that this answer reflects the fact that there

are steric clashes. Please provide the atomic distances for the atom pairs that your function

reports in collision.

Bonus : Upon a situation when the energy of the conformation resulting from a dihedral rotation

is high, how would you minimize the energy of this conformation? Provide a discussion and

pseudo-code on how you would perform the minimization.

Note: While you are welcome to use any programming language, please typeset your deliverables

and make sure that the quality of your images is good. You can use any visualization software to

produce your images. You are welcome to perform the asked rotations either through the Denavit-

Hartenberg derivation provided in our [PDF] document , or through simple rotations around arbitrary vectors. If you choose the latter, please note that there is a typo in the formula for the

rotation matrix Ri in Zhang-Kavraki, 2002 [PDF] on page 2: The second column, first row entry of this matrix is given as uxvy(1-cos(theta))+ vz sin(theta) , but it should be

vxvy(1-cos(theta))− vz sin(theta) .

1.3. Assignment 3: Inverse Kinematics*

Motivation for Inverse Kinematics in Proteins

One important application of inverse kinematics is in determining missing portions of protein

structure. We traditionally depend on experimental techniques to provide us with the picture of an

average structure for a protein. X-ray crystallography for example, relies on crystallizing proteins

and reporting the structure of the protein crystal within a certain resolution. One of the inherent

problems with X-ray crystallography is that mobile protein regions such as loops cause disorder in

the crystal and as a consequence, coordinates for the atoms of these mobile regions cannot be

reported. Often, in the PDB, crystallographically determined proteins are partially resolved, i.e. a

portion of the structure may be missing due to its intrinsic mobility. Even when experimental

techniques such as NMR and cryo-EM can report an average picture of the fully resolved protein,

the average structure reported is not indicative of the different conformations mobile regions can

assume inside our cells at room temperature.

The specific problem of completing a partially resolved protein structure by finding

conformations for its missing loop is known as the fragment completion or the loop closure

problem. Note that the loop closure problem is actually an inverse kinematics problem. Using

sequence information alone, i.e. knowing the aminoacid sequence of the missing loop, one can

generate starting loop conformations. The loop closure problem requires these loop conformations

to be geometrically constrained by attaching them to the portion of the protein structure that is

experimentally determined. Note that, as the picture below indicates, one can generate many loop

conformations in space through forward kinematics. One end of the loop can be attached to its

counterpart in the protein through translation alone. The other end however, needs to be attached

without breaking bonds or stretching bond angles. One way to do this is through inverse

kinematics; that is, knowing the goal position in space for the end of the loop, can we solve for the

dihedral DOFs of the loop conformation? This question can be answered by Inverse Kinematics

techniques.

index-173_1.jpg

Figure 1.7.

One "sticky" end of the loop can be attached to its stationary counterpart in the protein through translation. The other end needs to move towards its goal location by solving an Inverse Kinematics problem.

Inverse Kinematics for a Polypeptide Chain

Cyclic Coordinate Descent (CCD)

In this assignment you will complete a loop portion in the 1COA structure of the CI2 protein. X-

ray crystallography completely resolves the 1COA structure. However, even though a long loop

region from residue 34 to residue 46 is present in the native conformation of 1COA, an interesting

exercise is to pretend this loop region cannot be determined. Using an inverse kinematics

approach, you will sample many potential loops that can all complete the 1COA structure and

compare them to the native conformation obtained through X-ray crystallography. An important

question to answer is whether your loop closure algorithm can recover/predict the conformational

state of this loop region in CI2; that is, how different are your loops from the one in the native

conformation of CI2? You will implement a simple inverse kinematics technique, Cyclic

Coordinate Descent (CCD) as presented in [link]. For simplicity, you will work with the native

backbone of CI2, whose respective native coordinates you can also obtain from the

backbone_native.crd file.

Your task is to generate 10 different loops that all complete the protein structure of CI2 and

compare them to the native loop conformation. To do so, even though the native loop

conformation is given to you in the backbone.pdb file, you will pretend that the loop region from residue 34 to residue 46 is not resolved. You will generate different conformations for this loop

region by randomly sampling values for the dihedral of the loop, uniformly in the range [-PI, PI].

Then you will use the CCD algorithm described in [link] to steer these loop conformations to their goal/target location as specified by the coordinates of residue 46 in the native conformation of

CI2.

The Cyclic Coordinate Descent, as defined in [link], allows you to find the optimal dihedral angle for a dihedral bond by whose rotation a desired atom will get as close as possible to a target

position. To simplify this problem, you will not have target positions for the three atoms N, CA, C

as [link] does, but only on one atom, the C atom. This means that the equations you need to solve for the optimal dihedral rotation are going to be much simpler, with your distance measure S

defined as the Euclidean distance between current position M of atom C and its target position F.

Please address Q1 .

CCD for a Chain

You have noticed by now that CCD reports the angle by which to rotate a particular dihedral bond

so that the feature atom gets as close as possible to its target position. Consider a chain with the

feature atom at its end. As in real polypeptides, this chain does not contain only one dihedral

angle, but many. How would you use the CCD routine to solve for all the dihedrals of this chain so

that the final end of the chain reaches a target position? Please address Q2 . (Hint: You might consider solving for each dihedral one at a time, after you have imposed an order on the dihedrals

you are working with.)

For convenience, we will refer to the conformation of the chain where the atom reaches its target

position as the closure conformation. The problem you will address in this assignment is how to

obtain different closure conformations by solving for the dihedral DOFs of a chain in order to go

from a starting/initial/current conformation to a closure conformation. In order to generate closure

conformations, before solving for the dihedral DOFs through CCD for a chain, you need to

generate starting/initial conformations for the loop region of CI2. One way to do so is to generate

random angles for the dihedrals of the chain and then accumulate the respective transformation

matrices down the chain through Forward Kinematics, as you have done in Assignment 2. You are

not allowed to rotate only one dihedral bond to generate a random initial transformation. You

have to rotate all dihedral bonds of a chain by random angles in the interval [-PI, PI]. Please

answer Q3 .

Now it is time for you to put your pseudocode to the test. We first list the particular specifications

for the inverse kinematics problem you will solve in this assignment.

For this problem, you will steer the C atom of residue 46 of the loop region towards its target

position. We will refer to this atom as the feature atom.

We need to define the DOFs. As you will only generate closure conformations for the loop

region from residue 34 to residue 46, the only DOFs you can sample in the intial loop

conformations and solve for in the closure conformations are those of the loop region. Every

other dihedral you cannot change.

The CCD algorithm needs to have a measure of distance between the current position of the

feature atom and its target position. After you have generated initial conformations for the loop

region, the current position of the feature atom is position in space of the C atom of residue 46

in this initial conformation.

The target position of the feature atom will be its position as specified in the original (native)

conformation contained in the backbone_native.crd file.

Now we can put the pieces together: Generate an initial conformation by rotating the dihedrals

of the 34-46 loop region of CI2 with random angles sampled uniformly in the [-PI, PI] interval.

Solve for the dihedrals of the 34-46 region so that the C atom of aminoacid 46 in the initial

conformation you have just generated reaches its target position as specified by its position in

the native conformation in the backbone_native.crd file. Note that depending on the random

conformation you start with, it might be hard to steer atom C to reach its target position. After

applying the dihedrals you have solved for, you might need to do additional iterations to solve

for the dihedrals of the updated conformation, apply those and so on until the spatial constraint

for atom C is satisfied. To have a quantitative criterion for when atom C satisfies its spatial

constraint, you need to define a threshold parameter for S, as in [link]. Set this threshold criterion to 0.01 for this assignment. Please answer Q4 .

In this way, you will generate 10 closure conformations where the C atom of aminoacid 46 will be

in its target position, its position in the native conformation. It is interesting to note that the native

conformation itself is a closure conformation since it satisfies the spatial constraint on this atom.

Now you can quantify the similarity/difference between these 10 closure conformations and the

native conformation. As in assignment 1, you can compute the least RMSD of these 10 closure

conformations from the native. You can also compute the energy of each of these 10

conformations and the native. Please address Q5 and Q6 .

Setup with Matlab

Even though you are welcome to use any programming language to complete this assignment, we

suggest that you use Matlab due to its convenient matrix operations. For this assignment you will

need some basic trigonometry functions such as sine , cosine , tangent , and inverse

tangent/arctangent . You can find all these functions already implemented in Matlab. Please get familiar with their syntax by using the Help pages. Figure 1 captures the Help page for the

category of Functions By Category , subcategory Elementary Math . You need to read this

category to learn more on how to use these functions.

index-176_1.jpg

Figure 1.8.

Matlab Trigonometry

It is very important that you use the right arctangent implementation for this assignment. The

authors in [link] use the atan2 function as implemented in C . There is a similar function in Matlab . Please be careful to understand in what quadrant this function returns its output. You can

learn more about this function under the Help pages. Figure 2 captures the help page for this

function under the category of Functions By Category , subcategory Elementary Math , as part

of the Trigonometric functions.

index-177_1.jpg

Figure 1.9.

Matlab atan2 function

For Submission

Please follow this list closely.

Q1 : Please repeat the derivation in [link] with the simplification we made that only one atom is the one we want to to move to its target position. Typeset these derivations for submission.

Q2 : Please typeset your pseudocode for how you would iterate over a chain until the end of the

chain reaches its target position.

Q3 : Please provide pseudocode for how you would generate a random conformation for the 34-

46 region of the protein. Please generate 10 conformations in this way and visualize these

conformations through VMD, all superimposed over the native. Submit this image.

Q4 : Please plot all the final 10 conformations where the C atom of aminoacid 46 reaches its

target position. Plot these conformations over the native. Indicate where the C atom lies with a

VDW representation in red. Submit this image. (Note: remember that VMD requires a dummy

first line in .crd files, so add one if your code does not).

Q5 : Please compute the RMSD of the 10 closure conformations from the native. Please plot

these values and submit this plot. Please check the 10 closure conformations and the native

conformation for collisions with the energy function you implemented in assignment 2. Please

report the energetic scores of each conformation. Does the native conformation contain any

collisions? How do the other closure conformations compare to the native in terms of energy?

What can you conclude about your implementation of CCD to get a closure conformation with

regard to the energy of this closure conformation?

Q6 : The CCD algorithm does not take into consideration energetic constraints due to unfavorable

steric clashes. One way to impose energetic considerations is to discriminate against closure

conformations that contain steric clashes. You have already implemented a simplistic energy

function that can report the presence of collisions. Please provide pseudo-code on how you would

couple CCD with your energy function to report only closure conformations that do not contain

collisions.

Note: While you are welcome to use any programming language, please typeset your deliverables.

We encourage you to use Matlab. You may use any visualization software to produce your images.

References

1. A. A. Canutescu, and R. L. Dunbrack. (2003). Cyclic Coordinate Descent: A Robotics

Algorithm for Protein Loop Closure. [http://www.proteinscience.org/cgi/reprint/12/5/963].

Protein Science, 12, 963-972.

Solutions

Index

Symbols

α-complex, Computing the Alpha-Shape: Delaunay Triangulation

α-helices, Protein Structure

α-hull, Alpha-Shapes

α-shape, Computing the Alpha-Shape: Delaunay Triangulation

α-shapes, Alpha-Shapes

β-sheets,