1
Protein Functional Recognition Using a Spin-Image Representation Claudio Garutti1 Keywords: protein function, molecular recognition, spin-images, molecular interactions, binding site identification, protein cavities
1
Introduction
The study of functional aspects of proteins is one of the main problems in bioinformatics. Molecular recognition [4] and binding site identification [3] are of interest for the determination of the function of a protein. We propose a purely geometrical approach to gather both problems. We represent the surface of a protein as a collection of two-dimensional (2D) spin-images associated to Connolly’s points [2]. We enrich this representation by associating a label, blocked or unblocked, to each point P depending on the shape of its spin-image, and also a profile, which is a geometrical descriptor of the empty space as seen from P .
2
Methods
Given a surface point P with normal n, its associated spin-image is a 2D histogram of the positions of all the surface points, with respect to a reference frame formed by P and its tangent plane [6]. A surface points P with normal n is labeled unblocked if n does not intersect the surface at any other point lying above the tangent plane T at P perpendicular to n, otherwise is labeled blocked. Next, we define the spin-image profile for blocked and unblocked points, as a 1-D array, whose i-th element is a counter of the contiguous 0-pixels on row i of the spin. Since a 0-pixel of the image represents space not occupied by the protein points, the profile is a descriptor of the empty space as seen from P . When dealing with molecular recognition, we aim at establishing individual correspondences between points of the two surfaces. We look for pairs of points on the two proteins with the same label, either unblocked or blocked, and with highly correlated profiles. Only for such pairs, the 2D correlation of the spin images is determined, resulting in a significant computational advantage. In the matching procedure, we first map the points of each molecule on a 3D grid, and then find correspondences among the subsets of points belonging to pairs of grid cells [1]. For all pairs of selected cells we find correspondences between points that have high correlation of their spin-images. Then we group the obtained correspondences into sets of geometrically consistent correspondences. Finally, we score each group by finding the RMSD of the rigid transformation that best overlaps the sets of corresponding points. For what concerns the binding site identification, we have developed a method to identify external surface cavities and clefts in proteins based on a clustering procedure that groups blocked surface points. Given the spin-image of a blocked point P , we determine the biggest sphere, containing only empty cells, that touches P , and which center lies on the normal n of P . The sphere is obtained from the profile of the blocked point, with a simple and efficient procedure. Then we cluster the spheres, using a single linkage algorithm, as in DOCK [5]. 1 Dept. of Information Engineering, University of Padova, via Gradenigo 6/A, 35131 Padova, Italy. Email:
[email protected]
2
The binding site of the protein is identified as a relevant cavity of the molecular surface, represented by the cluster containing the highest number of blocked points.
3
Discussion
For the molecular recognition, we performed computational experiments with several pairs of protein surfaces, among which the catalytic subunit of the cAMP-dependent protein-kinase (1atp) and casein kinase-1 (1csn). In this case, the matching algorithm correctly aligned the two binding sites with an RMSD of 1.7, where the alignment was obtained as the top solution. Our findings show that the common binding site of the two proteins is usually retrieved among the top ranked solutions of the matching procedure. For the binding site identification, we conducted preliminary tests on a set of proteins belonging to the protein kinase family, where we often found the binding site as the largest cavity on the surface, as shown in figure 1.
Figure 1: Largest cavities in proteins 1csn and 1atp. (green: atoms found by our method that belong to the binding site; yellow: atoms of the binding site not detected; blue: atoms detected outside the binding site.)
References [1] Bock, M. E., Cortelazzo, G. M., Ferrari, C. and Guerra, C. 2005. Identifying similar surface patches on proteins using a spin-image surface representation. CPM, Seoul. [2] Connolly, M. L. 1983. Solvent-accessible Surfaces of Proteins and Nucleic Acids. Science 221:709–713. [3] Glaser, F., R. J. Morris, R. J., Najmanovich, R. J., Laskowski, R. A. and Thornton, J. M. 2006. A Method for Localizing Ligand Binding Pockets in Protein Structures. PROTEINS: Structure, Function, and Bioinformatics 62:479-488. [4] Shulman-Peleg, A., Nussinov, R., and Wolfson, H. J. 2004. Recognition of Functional Sites in Protein Structures. J. Mol. Biol. 339:607-633. [5] Shoichet, B.K., and Kuntz, I.D. 1991. Protein Docking and Complementarity. J. Mol. Biol., 221:327–346. [6] Johnson, A. E. and Hebert, M. 1999. Using spin-images for efficient multiple model recognition in cluttered 3-d scenes. IEEE PAMI, 21(5):433-449.