Skip to main content
Skip main navigation
No Access

Bloat free Genetic Programming: application to human oral bioavailability prediction

Published Online:pp 585-601https://doi.org/10.1504/IJDMB.2012.050266

Being able to predict the human oral bioavailability for a potential new drug is extremely important for the drug discovery process. This problem has been addressed by several prediction tools, with Genetic Programming providing some of the best results ever achieved. In this paper we use the newest developments of Genetic Programming, in particular the latest bloat control method, Operator Equalisation, to find out how much improvement we can achieve on this problem. We show examples of some actual solutions and discuss their quality, comparing them with previously published results. We identify some unexpected behaviours related to overfitting, and discuss the way for further improving the practical usage of the Genetic Programming approach.

Keywords

genetic programming, bloat, code growth, operator equalisation, data mining, drug discovery, human oral bioavailability, prediction, symbolic regression, overfitting, solution length, feature selection

References

  • 1. F. Archetti, S. Lanzeni, E. Messina, L. Vanneschi, M. Cattolico Ed., '‘Genetic programming for human oral bioavailability of drugs’' Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (2006) Google Scholar
  • 2. F. Archetti, E. Messina, S. Lanzeni, L. Vanneschi, '‘Genetic programming for computational pharmacokinetics in drug discovery and development’' Genetic Programming and Evolvable Machines (2007) Google Scholar
  • 3. W. Banzhaf, P. Nordin, R.E. Keller, F.D. Francone, Genetic Programming – An Introduction; On the Automatic Evolution of Computer Programs and its Applications (1998) Google Scholar
  • 4. S. David, C. Wishart, A.C. Knox, S. Guo, M. Shrivastava, P. Hassanali, Z.C Stothard, J. Woolsey, '‘DrugBank: a comprehensive resource for in silico drug discovery and exploration’' Nucleic Acids Research (2006) Google Scholar
  • 5. S. Dignum, R. Poli, D. Thierens Ed., H.-G. Beyer Ed., J. Bongard Ed., J. Branke Ed., J.A. Clark Ed., D. Cliff Ed., C.B. Congdon Ed., K. Deb Ed., B. Doerr Ed., T. Kovacs Ed., S. Kumar Ed., J.F. Miller Ed., J. Moore Ed., F. Neumann Ed., M. Pelikan Ed., R. Poli Ed., K. Sastry Ed., K.O. Stanley Ed., T. Stutzle Ed., R.A. Watson Ed., I. Wegener Ed., '‘Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat’' GECCO’07: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation (2007) Google Scholar
  • 6. S. Dignum, R. Poli, M. O’Neill Ed., L. Vanneschi Ed., S. Gustafson Ed., A.I. Esparcia Alcázar Ed., I. De Falco Ed., A. Della Cioppa Ed., E. Tarantino Ed., '‘Crossover, sampling, bloat and the harmful effects of size limits’' Proceedings of the 11th European Conference on Genetic Programming, EuroGP 2008 (2008a) Google Scholar
  • 7. GPLAB, A Genetic Programming Toolbox for MATLAB (2009) Google Scholar
  • 8. T. Kennedy, '‘Managing the drug discovery/development interface’' Drug Discovery Today (1997) Google Scholar
  • 9. I. Kola, J. Landis, '‘Can the pharmaceutical industry reduce attrition rates?’' Nature Reviews Dug Discovery (2004) Google Scholar
  • 10. J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (1992) Google Scholar
  • 11. J.R. Koza, '‘Human-competitive results produced by genetic programming’' Genetic Programming and Evolvable Machines (2010) Google Scholar
  • 12. W.B. Langdon, S.J. Barrett, '‘Genetic Programming in data mining for drug discovery’' Evolutionary Computing in Data Mining (2004) Google Scholar
  • 13. W.B. Langdon, R. Poli, Foundations of Genetic Programming (2002) Google Scholar
  • 14. S. Luke, L. Panait, W.B. Langdon Ed., E. Cantú-Paz Ed., K.E. Mathias Ed., R. Roy Ed., D. Davis Ed., R. Poli Ed., K. Balakrishnan Ed., V. Honavar Ed., G. Rudolph Ed., J. Wegener Ed., L. Bull Ed., M.A. Potter Ed., A.C. Schultz Ed., J.F. Miller Ed., E.K. Burke Ed., N. Jonoska Ed., '‘Lexicographic parsimony pressure’' GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference (2002) Google Scholar
  • 15. M. O’Neill, L. Vanneschi, S. Gustafson, W. Banzhaf, '‘Open issues in genetic programming’' Genetic Programming and Evolvable Machines (2010) Google Scholar
  • 16. R. Poli, W.B. Langdon, S. Dignum, M. Ebner Ed., M. O’Neill Ed., A. Ekárt Ed., L. Vanneschi Ed., A.I. Esparcia Alcázar Ed., '‘On the limiting distribution of program sizes in tree-based genetic programming’' Proceedings of the 10th European Conference on Genetic Programming (2007) Google Scholar
  • 17. R. Poli, W.B. Langdon, N.F. McPhee, A Field Guide to Genetic Programming (2008a) Google Scholar
  • 18. R. Poli, N.F. McPhee, L. Vanneschi, M. Keijzer Ed., G. Antoniol Ed., C.B. Congdon Ed., K. Deb Ed., B. Doerr Ed., N. Hansen Ed., J.H. Holmes Ed., G.S. Hornby Ed., D. Howard Ed., J. Kennedy Ed., S. Kumar Ed., F.G. Lobo Ed., J.F. Miller Ed., J. Moore Ed., F. Neumann Ed., M. Pelikan Ed., J. Pollack Ed., K. Sastry Ed., K. Stanley Ed., A. Stoica Ed., E-G. Talbi Ed., I. Wegener Ed., '‘The impact of population size on code growth in GP: analysis and empirical validation’' GECCO ’08: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation (2008b) Google Scholar
  • 19. S. Silva, Controlling Bloat: Individual and Population Based Approaches in Genetic Programming (2008) Google Scholar
  • 20. S. Silva, E. Costa, '‘Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories’' Genetic Programming and Evolvable Machines (2009) Google Scholar
  • 21. S. Silva, S. Dignum, L. Vanneschi Ed., S. Gustafson Ed., A. Moraglio Ed., I. De Falco Ed., M. Ebner Ed., '‘Extending operator equalisation: Fitness based self adaptive length distribution for bloat free GP’' Proceedings of the 12th European Conference on Genetic Programming, EuroGP2009 (2009) Google Scholar
  • 22. S. Silva, L. Vanneschi, G. Raidl Ed., F. Rothlauf Ed., G. Squillero Ed., R. Drechsler Ed., T. Stuetzle Ed., M. Birattari Ed., C.B. Congdon Ed., M. Middendorf Ed., C. Blum Ed., C. Cotta Ed., P. Bosman Ed., J. Grahl Ed., J. Knowles Ed., D. Corne Ed., H-G. Beyer Ed., K. Stanley Ed., J.F. Miller Ed., J. van Hemert Ed., T. Lenaerts Ed., M. Ebner Ed., J. Bacardit Ed., M. O’Neill Ed., M. Di Penta Ed., B. Doerr Ed., T. Jansen Ed., R. Poli Ed., E. Alba Ed., '‘Operator equalisation, bloat and overfitting. A study on human oral bioavailability prediction’' GECCO 2009: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation (2009) Google Scholar
  • 23. Simulation Plus Inc., A Company that Uses Both Statistical Methods and Differential Equations based Simulations for ADME Parameter Estimation (2006) Google Scholar
  • 24. H. Van de Waterbeemd, S. Rose, L.G. Wermuth Ed., '‘Quantitative approaches to structure-activity relationships’' The Practice of Medicinal Chemistry (2003) Google Scholar
  • 25. L. Vanneschi, S. Silva, L.S. Lopes Ed., N. Lau Ed., P. Mariano Ed., L.M. Rocha Ed., '‘Using operator equalisation for prediction of drug toxicity with genetic programming’' (2009) Google Scholar
  • 26. L. Vanneschi, M. Castelli, S. Silva, '‘Measuring bloat, overfitting and functional complexity in genetic programming’' GECCO’10: Proceedings of the 12th Annual conference on Genetic and Evolutionary Computation (2010) Google Scholar
  • 27. F. Yoshida, J.G. Topliss, '‘QSAR model for drug human oral bioavailability’' Journal of Medicinal Chemistry (2000) Google Scholar
  • 28. E.M. Zechman, S.R. Ranjithan, H-G. Beyer Ed., U-M. O’Reilly Ed., D.V. Arnold Ed., W. Banzhaf Ed., C. Blum Ed., E.W. Bonabeau Ed., E. Cantú-Paz Ed., D. Dasgupta Ed., K. Deb Ed., J.A. Foster Ed., E.D. de Jong Ed., H. Lipson Ed., X. Llora Ed., S. Mancoridis Ed., M. Pelikan Ed., G.R. Raidl Ed., T. Soule Ed., A.M. Tyrrell Ed., J-P. Watson Ed., E. Zitzler Ed., '‘Coevolutionary. Multipopulation cooperative coevolutionary programming (MCCP) to enhance design innovation’' Proceedings of the Genetic and Evolutionary Computation Conference (2005) Google Scholar