Skip to main content
Skip main navigation
No Access

Predicting the secondary structure of proteins using Machine Learning algorithms

Published Online:pp 571-584https://doi.org/10.1504/IJDMB.2012.050265

The functions of proteins in living organisms are related to their 3-D structure, which is known to be ultimately determined by their linear sequence of amino acids that together form these macromolecules. It is, therefore, of great importance to be able to understand and predict how the protein 3Dstructure arises from a particular linear sequence of amino acids. In this paper we report the application of Machine Learning methods to predict, with high values of accuracy, the secondary structure of proteins, namely α-helices and β-sheets, which are intermediate levels of the local structure.

Keywords

data mining, machine learning, classification, decision trees, rule induction, instance-based learning, Bayesian algorithms, WEKA, bioinformatics, protein folding, predicting secondary structure conformations

References

  • 1. D. Aha, D. Kibler, '‘Instance-based learning algorithms’' Machine Learning (1991) Google Scholar
  • 2. B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, P. Walter, Molecular Biology of the Cell (2002) Google Scholar
  • 3. M. Blader, X. Zhang, B. Matthews, '‘Structural basis of aminoacid alpha helix propensity’' Science (1993) Google Scholar
  • 4. L. Breiman, '‘Random forests’' Machine Learning (2001) Google Scholar
  • 5. L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees (1984) Google Scholar
  • 6. P. Chou, G. Fasman, '‘Prediction of secondary structure of proteins from their amino acid sequence’' Advances in Enzymology and Related Areas of Molecular Biology (1978) Google Scholar
  • 7. J. Cuff, M. Clamp, A. Siddiqui, M. Finlay, J. Barton, M. Sternberg, '‘JPRED: a consensus secondary structure prediction server’' J. Bioinformatics (1998) Google Scholar
  • 8. N.A. Fonseca, R. Camacho, A. Magalhaes, '‘A study on amino acid pairing at the n- and c-termini of helical segments in proteins’' PROTEINS: Structure, Function and Bioinformatics (2008) Google Scholar
  • 9. Y. Freund, L. Mason, '‘The alternating decision tree learning algorithm’' Proceeding of the Sixteenth International Conference on Machine Learning (1999) Google Scholar
  • 10. D. Frishman, P. Argos, '‘Seventy-five percent accuracy in protein secondary structure prediction’' Proteins (1997) Google Scholar
  • 11. J. Gama, '‘Functional trees’' Machine Learning (2004) Google Scholar
  • 12. J. Gsponer, U. Haberthur, A. Caflisch, '‘The role of side-chain interactions in the early steps of aggregation:Molecular dynamics simulations of an amyloid-forming peptide from the yeast prion Sup35’' Proceedings of the National Academy of Sciences-USA (2003) Google Scholar
  • 13. G.H. John, P. Langley, '‘Estimating continuous distributions in Bayesian classifiers’' (1995) Google Scholar
  • 14. T.D. Jones, '‘Protein secondary structure prediction based on position-specific scoring matrices’' Journal of Molecular Biology (1999) Google Scholar
  • 15. R. King, M. Sternberg, '‘A machine learning approach for the protein secondary structure’' Journal of Molecular Biology (1990) Google Scholar
  • 16. R. King, M. Sternberg, '‘Identification and application of the concepts important for accurate and reliable protein secondary structure prediction’' Protein Sci. (1996) Google Scholar
  • 17. F.C.D. Kneller, R. Langridge, '‘Improvements in protein secondary structure prediction by an enhanced neural network’' Journal of Molecular Biology (1990) Google Scholar
  • 18. C. Krittanai, W.C. Johnson, '‘The relative order of helical propensity of amino acids changes with solvent environment’' Proteins: Structure, Function and Genetics (2000) Google Scholar
  • 19. N. Landwehr, M. Hall, E. Frank, '‘Logistic model trees’' Machine Learning (2005) Google Scholar
  • 20. S. Muggleton Ed., Inductive Logic Programming (1992) Google Scholar
  • 21. S. Muggleton, C. Feng, '‘Efficient induction of logic programs’' Proceedings of the First Conference on Algorithmic Learning Theory (1990) Google Scholar
  • 22. G.A. Petsko, G.A. Petsko, Protein Stucture and Function (Primers in Biology) (2007) Google Scholar
  • 23. N. Qian, T.J. Sejnowski, '‘Predicting the secondary structure of globular proteins using neural network models’' Journal of Molecular Biology (1988) Google Scholar
  • 24. R. Quinlan, C4.5: Programs for Machine Learning (1993) Google Scholar
  • 25. J. Richardson, D. Richardson, '‘Amino acid preferences for specific locations at the ends of α-helices’' Science (1988) Google Scholar
  • 26. B. Rost, '‘Phd: predicting 1d protein structure by profile based neural networks’' Meth.in Enzym (1996) Google Scholar
  • 27. B. Rost, C. Sander, '‘Prediction of protein secondary structure at better than 70% accuracy’' Journal of Molecular Biology (1993) Google Scholar
  • 28. A. Salamov, V. Solovyev, '‘Prediction of protein structure by combining nearestneighbor algorithms and multiple sequence alignments’' J. Mol. Biol. (1995) Google Scholar
  • 29. M. Sela, F.H. White, C.B. Anfinsen, '‘Reductive cleavage of disulfide bridges in ribonuclease’' Science (1957) Google Scholar
  • 30. M. Sternberg, R. Lewis, R. King, S. Muggleton, '‘Modelling the structure and function of enzymes by machine learning’' Proceedings of the Royal Society of Chemistry: Faraday Discussions (1992) Google Scholar
  • 31. G. Wang, R. Dunbrack, '‘PISCES: a protein sequence culling server’' Bioinformatics (2003) Google Scholar
  • 32. I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques (2005) Google Scholar
  • 33. () Google Scholar
  • 34. () Google Scholar