Abstract
Predicting the three-dimensional (3-D) structure of a protein that has no templates in the Protein Data Bank (PDB) is a very hard, still an impossible task. Computational prediction methods have been developed during the last years, but the problem still remains challenging. In this paper we present a new strategy based on Interval Arithmetic to store structural information obtained from experimental protein templates and predict native-like approximate three-dimensional structures of proteins. Our objective is to perform the prediction in a very fast manner and predict native-like structures that can be used as starting point structures to ab initio methods. We illustrate the efficacy of our method in five case studies of polypeptides.
Keywords
References
- 1. (1983). Introduction to Interval Computations.. 1st ed., New York, USA:Academic Press Google Scholar
- 2. (1997). ‘Gapped blast and psi-blast: a new generation of protein database search programs’. Nucleic Acids Res.. 25, 17, 3389-3402 Medline, Google Scholar
- 3. (2006). ‘The swiss-model workspace: A web-based environment for protein structure homology modelling’. Bioinformatics. 22, 2, 195-201 Medline, Google Scholar
- 4. (1987). ‘Structure of the cole1 rop protein at 1.7 a resolution’. J. Mol. Biol.. 196, 3, 657-675 Medline, Google Scholar
- 5. (2001). Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins.. 3rd ed., New Jersey, USA:John Wiley and Sons Inc. Google Scholar
- 6. (2000). ‘The protein data bank’. Nucleic Acids Res.. 28, 1, 235-242 Medline, Google Scholar
- 7. (1998). Introduction to Protein Structure. 2nd ed., New York, USA:Garlang Publishing Inc. Google Scholar
- 8. (2007). ‘Ab initio 3-d structure prediction of an artificially designed three-a-helix bundle via all-atom molecular dynamics simulations’. Genet. Mol. Res.. 6, 4, 901-910 Medline, Google Scholar
- 9. (1995). ‘Statistics of sequence-structure threading’. Curr. Opin. Struct. Biol.. 5, 2, 236-244 Medline, Google Scholar
- 10. (2006). ‘Protein structure prediction by recombination of fragments’. ChemBioChem. 7, 1, 19-27 Medline, Google Scholar
- 11. (2005). ‘The amber biomolecular simulation program’. J. Comput. Chem.. 26, 16, 1668-1688 Medline, Google Scholar
- 12. (2000). ‘Biopython: python tools for computational biology’. ACM SIGBIO Newsl.. 20, 2, 15-19 Google Scholar
- 13. (2005). ‘Scratch: a protein structure and structural feature prediction server’. Nucleic Acids Res.. 33, 2, 72-76 Google Scholar
- 14. (2000). ‘Nps: Network protein sequence analysis’. Trends Biochem. Sci.. 25, 3, 147-150 Medline, Google Scholar
- 15. (1995). ‘A second generation force field for the simulation of proteins, nucleic acids, and organic molecules’. J. Am. Chem. Soc.. 117, 19, 5179-5197 Google Scholar
- 16. (1990). ‘Protein folding’. Biochem. J.. 270, 1-16 Medline, Google Scholar
- 17. (2002). The PyMOL Molecular Graphics System. San Carlos, CA, USA:Delano Scientific Google Scholar
- 18. (1987). ‘An algorithm for protein secondary structure prediction based on class prediction’. Protein Eng.. 1, 4, 289-294 Medline, Google Scholar
- 19. (2008). ‘A hybrid method for the protein structure prediction problem’. Lect. Notes Bioinf.. 5167, 47-56 Google Scholar
- 20. (2008).
‘Cref: A central-residue-fragment-based method for predicting approximate 3-d polypeptides structures’.
Proceedings of the ACM Symposium on Applied computing.
Fortaleza, Brazil , 1261-1267 Google Scholar - 21. (2010a). ‘Mining the protein data bank with cref to predict approximate 3-d structures of polypeptides’. Int. J. Data Min. Bioin.. 4, 3, 281-299 Abstract, Google Scholar
- 22. (1993). ‘Backbone-dependent rotamer library for proteins: application to side-chain prediction’. J. Mol. Biol.. 230, 2, 543-574 Medline, Google Scholar
- 23. (2006). ‘Comparative protein structure modelling with modeller’. Curr. Protoc. Bioinf.. 15, 56, 1-30 Google Scholar
- 24. (2000). ‘Modelling of loops in protein structure’. Protein Sci.. 9, 9, 1753-1773 Medline, Google Scholar
- 25. (2006). ‘Advances in protein structure prediction and de novo protein design: A review’. Chem. Eng. Sci.. 61, 3, 966-988 Google Scholar
- 26. (1996). ‘Gor secondary structure prediction method version iv’. Methods Enzymol.. 266, 540-553 Medline, Google Scholar
- 27. (1978). ‘Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins’. J. Mol. Biol.. 120, 1, 97-120 Medline, Google Scholar
- 28. (1994). ‘Sopm: a self-optimised method for protein secondary structure prediction’. Protein Eng.. 7, 2, 157-164 Medline, Google Scholar
- 29. (1995). ‘Sopma: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments’. Comput. Appl. Biosci.. 11, 6, 681-684 Medline, Google Scholar
- 30. (1987). ‘Further developments of protein secondary structure prediction using information theory. new parameters and consideration of residue pairs’. J. Mol. Biol.. 198, 3, 425-443 Medline, Google Scholar
- 31. (1980). Simulation: Principles and Methods.. 1st ed., Cambridge, UK:Cambridge Winthrop Publishers Inc. Google Scholar
- 32. (1991). ‘A novel, highly stable fold of the immunoglobulin binding domain of streptococcal protein g’. Science. 253, 5020, 657-661 Medline, Google Scholar
- 33. (1996). ‘Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence’. Protein Eng.. 9, 2, 133-142 Medline, Google Scholar
- 34. (1999). ‘Improved performance in protein secondary structure prediction by inhomogeneous score combination’. Bioinformatics. 15, 5, 413-421 Medline, Google Scholar
- 35. (2002). ‘Conformation of amino acids in protein’. Acta Crystallogr., Sect. D: Biol. Crystallogr.. 58, 5, 768-776 Medline, Google Scholar
- 36. (1996). ‘Promotif: A program to identify and analyse structural motifs in proteins’. Protein Sci.. 5, 2, 212-220 Medline, Google Scholar
- 37. (2002). ‘On the role of the crystal environment in determining protein side-chain conformations’. J. Mol. Biol.. 320, 3, 597-608 Medline, Google Scholar
- 38. (1992). ‘A new approach to protein fold recognition’. Nature. 358, 6381, 86-89 Medline, Google Scholar
- 39. (1999). ‘Genthreader: an efficient and reliable protein fold recognition method for genomic sequences’. J. Mol. Biol.. 287, 4, 797-815 Medline, Google Scholar
- 40. (2001). ‘Predicting novel protein folds by using fragfold’. Proteins. 45, S5, 127-132 Google Scholar
- 41. (1983). ‘Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features’. Biopolymers. 22, 12, 2577-2637 Medline, Google Scholar
- 42. (1996). ‘Identification and application of the concepts important for accurate and reliable protein secondary structure prediction’. Proteins. 5, 11, 2298-2310 Google Scholar
- 43. (2004). ‘Protein modelling and structure prediction with a reduced representation’. Acta Biochim. Pol.. 51, 349-371 Medline, Google Scholar
- 44. (2004). ‘Algorithms for calculating excluded volume and its derivatives as a function of molecular conformation and their use in energy minimisation’. J. Comput. Chem.. 12, 3, 402-409 Google Scholar
- 45. (1993). ‘Procheck: a program to check the stereochemical quality of protein structures’. J. Appl. Crystallogr.. 26, 2, 283-291 Google Scholar
- 46. (2000). Introduction to Protein Architecture: the Structural Biology of Proteins.. 1st ed., Cambridge, UK:Oxford University Press Google Scholar
- 47. (1968). ‘Are the pathways for protein folding?’. J. Chem. Phys.. 65, 44-45 Google Scholar
- 48. (1995). ‘Kinetic and structural characterisation of muta-tions of glycine 216 in alpha-lytic protease: a new target for engineering substrate specificity’. J. Mol. Biol.. 254, 4, 720-736 Medline, Google Scholar
- 49. (2000). ‘Comparative protein structure modelling of genes and genomes’. Annu. Rev. Biophys. Biomol. Struct.. 29, 16, 291-235 Medline, Google Scholar
- 50. (1982). ‘Rapid comparison of protein structures’. Acta Crystallogr.. A38, 871-873 Google Scholar
- 51. (1959).
‘Interval analysis’.
Sunnyvale, CA, USA:Lockheed Missiles and Space Co.
,
Technical Report Space Div. Report LMSD285875 , Technical Report Google Scholar - 52. (1959).
Automatic Error Analysis in Digital Computation.
Sunnyvale, CA, USA:Lockheed Missiles and Space Co.
,
Technical Report Space Div. Report LMSD84821 , Technical report Google Scholar - 53. (1966). Interval Analysis.. 1st ed., New Jersey, USA:Prentice-Hall Google Scholar
- 54. (1999). ‘The dawning’. Reliab. Comput.. 5, 423-424 Google Scholar
- 55. (1992). ‘Stereochemical quality of protein structure coordinates’. Proteins. 12, 4, 345-364 Medline, Google Scholar
- 56. (2005). ‘Decade of casp: progress, bottlenecks an prognosis in protein structure prediction’. Curr. Opin. Struct. Biol.. 15, 285-289 Medline, Google Scholar
- 57. (1997). The Protein Folding Problem and Tertiary Structure Prediction. Boston, MA, USA, chapter Computational complexity, protein structure prediction and the Levinthal Paradox Google Scholar
- 58. (2000). ‘Ab initio protein folding’. Curr. Opin. Struct. Biol.. 10, 2, 146-152 Medline, Google Scholar
- 59. (1998). ‘Analysis and application of potential energy smoothing for global optimisation’. J. Phys. Chem. B. 105, 9725-9742 Google Scholar
- 60. (2002). ‘Combinatorial approaches: a new tool to search for highly structured beta-hairpin peptides’. Proc. Natl. Acad. Sci. USA. 99, 2, 614-619 Medline, Google Scholar
- 61. (2007). ‘An efficient newton-like method for molecular mechanics energy minimisation of large molecules’. Comput. Chem.. 8, 1016-1024 Google Scholar
- 62. (1968). ‘Conformation of polypeptides and proteins’. Adv. Protein Chem.. 23, 238-437 Google Scholar
- 63. (2009). ‘Structure prediction for casp8 with all-atom refinement using rosetta’. Proteins. 77, 89-99 Medline, Google Scholar
- 64. (2003). ‘Polarisable atomic multipole water model for molecular mechanics simulation’. J. Phys. Chem. B. 107, 24, 5933-5947 Google Scholar
- 65. (2004). ‘Protein structure prediction using rosetta’. Methods Enzymol.. 383, 66-93 Medline, Google Scholar
- 66. (1993). ‘Prediction of protein secondary structure at better than 70% accuracy’. J. Mol. Biol.. 232, 584-599 Medline, Google Scholar
- 67. (1997). ‘Advances in comparative protein-structure modelling’. Curr. Opin. Struct. Biol.. 7, 2, 206-214 Medline, Google Scholar
- 68. (1999). ‘Ab initio protein structure prediction of casp iii targets using rosetta’. Proteins. 3, 171-176 Medline, Google Scholar
- 69. (1995). ‘Linus – a hierarchic procedure to predict the fold of a protein’. Proteins. 22, 2, 81-99 Medline, Google Scholar
- 70. (2002). ‘Ab initio prediction of protein structure using linus’. Proteins. 47, 4, 489-495 Medline, Google Scholar
- 71. (1997). ‘Structural mimicry of a native protein by a minimised binding domain’. Proc. Natl. Acad. Sci. USA. 94, 10080-10085 Medline, Google Scholar
- 72. (2006). Protein Structure Prediction. 1st ed., Weinheim, Germany:John Wiley and Sons Inc. Google Scholar
- 73. (2005). ‘Gromacs: fast, flexible, and free’. J. Comput. Chem.. 26, 16, 1701-1718 Medline, Google Scholar
- 74. (1990). ‘Computer simulation of molecular dynamics: methodology, applications, and perspectives in chemistry’. Angew. Chem., Int. Ed. Engl.. 29, 9, 992-1023 Google Scholar
- 75. (2006).
Data Mining: Practical Machine Learning Tools and Techniques..
2nd ed.,
Oxford, UK,
Morgan Kaufmann Series in Data Management Systems Google Scholar - 76. (2001). ‘Extending the accuracy limits of prediction for side-chain conformations’. J. Mol. Biol.. 311, 2, 421-430 Medline, Google Scholar
- 77. (1989).
‘Protein structure prediction by a data-level parallel algorithm’.
Proceedings of the ACM/IEEE Conference on Supercomputing.
Reno, NV, USA , 215-223 Google Scholar - 78. (2007). ‘Template-based modelling and free modelling by i-tasser in casp7’. Proteins. 8, 108-117 Google Scholar
- 79. (2008). ‘I-tasser server for protein 3d structure prediction’. BMC Bioinf.. 9, 40, 1-8 Medline, Google Scholar
- 80. (2009). ‘I-tasser: Fully automated protein structure prediction in casp8’. Proteins. 77, S9, 100-113 Medline, Google Scholar
- 81. (1996). ‘Patters and conformations of commonly occurring supersecondary structures (basic motifs) in protein data bank’. J. Protein Chem.. 15, 7, 675-690 Medline, Google Scholar


