The Biocomputing Research Lab

Image: Figures pertaining to research

Residue Contexts

Flexible and non-sequential protein structure alignment


Proteins play an integral role in virtually all cell processes. The myriad of functions performed by various protein molecules is determined, to a large extent, by the three-dimensional structure of a protein when folded into a functional globular form. Thus the ability to measure and quantify the degree of structural similarity between arbitrary proteins of interest is essential for gaining a deeper understanding of protein structure-function relationships and elucidating a map of the protein universe. Protein structure alignment methods attempt to capture structural similarities between proteins using a guided set of biological, physical, and mathematical principles. Various methods have served as valuable comparison tools to assist in defining protein classification schemes and to predict the unknown function(s) of newly deduced protein structures. Despite many successful formulations, however, significant challenges to the state- of-the-art arise when dealing with the following types of protein structures and fold families: (1) Significant inconsistencies between methods arise from the alignment of distantly related proteins due to structural variability resulting from the evolutionary accumulation of mutations. Mutations such as extensive insertions and/or deletions, and repetitions may force current methods to erroneously converge on local optima. (2) Protein flexibility results from the alternate ways in which a molecule can be packed in crystal form. Protein structures in various conformations may require flexibility in the form of translations and rotations about arbitrary points on the C- alpha backbone. (3) Non-sequential relationships arise from evolutionary events such as circular permutations, domain swaps, and beta-hairpin flips, and require a different connectivity of the aligned fragments in the proteins being compared.

In light of the issues described above, we approach the problem of protein structural alignment in two steps: (1) We first identify similar sub-regions or local structural similarities between protein structures, not just by matching features from a single residue or considering well defined regions in the structure such as alpha-helices and beta-strands, but through rich and robust descriptors that can capture structural similarities of the local 3D environments around arbitrary residues of interest. The distribution of C-alpha-backbone atoms around arbitrary residues are captured in spherical histograms, which are subsequently matched using a cross-bin distance measure to quantify the degree of similarity. Note that the local structure matches captured by our histograms are not restricted to single protein chain fragments which follow the natural sequential ordering of a protein chain, rather we capture non-sequential relationships by ignoring residue-to-residue connections based on the primary sequence. Given arbitrary points on two protein chains, we also employ an automatic scale selection step to find an ideal histogram radius r which determines the size of the local 3D environment. (2) The local matches found during the previous step are constructed into longer alignments while allowing flexibility. Our algorithm is designed to identify two types of flexible regions: (a) “Inter-domain hinge regions” enable global-level movements between large domains as related to conformational changes and protein function. (b) “Intra-domain hinges” are smaller rotations and translations which allow for a closer alignment of residues within a single domain where domain boundaries are defined by “inter-domain hinge regions”.


Residue Contexts


  • J. Kim and R. Singh, "Residue Contexts: Non-Sequential Protein Structure Alignment Using Structural and Biochemical Features", International Symposium on Bioinformatics Research and Application (ISBRA), 2010, Lecture Notes in Bioinformatics, Springer, pp. 77-88. [PDF]
SF State Home