Skip to main content
No Access

Data mining with a parallel rule induction system based on gene expression programming

Published Online:pp 136-143https://doi.org/10.1504/IJICA.2011.041914

A parallel rule induction system based on gene expression programming (GEP) is reported in this paper. The system was developed for data classification. The parallel processing environment was implemented on a cluster using a message-passing interface. A master-slave GEP was implemented according to the Michigan approach for representing a solution for a classification problem. A multiple master-slave system (islands) was implemented in order to observe the co-evolution effect. Experiments were done with ten datasets, and algorithms were systematically compared with C4.5. Results were analysed from the point of view of a multi-objective problem, taking into account both predictive accuracy and comprehensibility of induced rules. Overall results indicate that the proposed system achieves better predictive accuracy with shorter rules, when compared with C4.5.

Keywords

evolutionary computation, EC, gene expression programming, GEP, data mining

References

  • 1. X. Du, L. Ding, L. Jia, '‘Asynchronous distributed parallel gene expression programming based on estimation of distribution algorithm’' in Proceedings of the 4th International Conference on Natural Computation (2008) Google Scholar
  • 2. C. Ferreira, Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence (2006) Google Scholar
  • 3. P. Flach, N. Lavrac, M. Berthold Ed., D. Hand Ed., '‘Rule induction’' Intelligent Data Analysis: an Introduction (2003) Google Scholar
  • 4. A. Frank, A. Asunción, '‘UCI machine learning repository’' (2010) Google Scholar
  • 5. A. Freitas, Data Mining and Knowledge Discovery with Evolutionary Algorithms (2002) Google Scholar
  • 6. D. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning (1989) Google Scholar
  • 7. W. Gropp, E. Lusk, R. Thakur, Using MPI-2: Advanced Features of the Message-Passing Interface (1999) Google Scholar
  • 8. D. Hand, Construction and Assessment of Classification Rules (1997) Google Scholar
  • 9. U. Johansson, L. Niklasson, R. König, P. Svensson Ed., J. Schubert Ed., '‘Accuracy vs. comprehensibility in data mining models’' Proceedings of the Seventh International Conference on Information Fusion (2004) Google Scholar
  • 10. J. Koza, Genetic Programming: on the Programming of Computers by Means of Natural Selection (1992) Google Scholar
  • 11. X. Li, C. Zhou, W. Xiao, P.C. Nelson, F. Rothlauf Ed., '‘Prefix gene expression programming’' Proceedings of Genetic and Evolutionary Computation Conference GECCO (2005) Google Scholar
  • 12. H. Lopes, M. Coutinho, W. de Lima, E. Sanchez Ed., T. Shibata Ed., L. Zadeh Ed., '‘An evolutionary approach to simulate cognitive feedback learning in medical domain’' Genetic Algorithms and Fuzzy Logic Systems (1997) Google Scholar
  • 13. H.S. Lopes, W.R. Weinert, '‘EGIPSYS: an enhanced gene expression programming approach for symbolic regression problems’' International Journal of Applied Mathematics and Computer Science (2004) Google Scholar
  • 14. M. Matsumoto, T. Nishimura, '‘Mersenne twister: a 623-dimensionally equidistributed uniform pseudorandom number generator’' ACM Transactions on Modeling and Computer Simulation (1998) Google Scholar
  • 15. H. Park, A. Grings, M.V. dos Santos, A.S. Soares, '‘Parallel hybrid evolutionary computation: automatic tunning of parameters for parallel gene expression’' Applied Mathematics and Computation (2008) Google Scholar
  • 16. R. Poli, W.B. Langdon, N.F. McPhee, A Field Guide to Genetic Programming (2008) Google Scholar
  • 17. J.R. Quinlan, C4.5: Programs for Machine Learning (1993) Google Scholar
  • 18. S. Rosset, C. Perlich, G. Świrszcz, P. Melville, Y. Liu, '‘Medical data mining: insights from winning two competitions’' Data Mining and Knowledge Discovery (2010) Google Scholar
  • 19. M. Snir, S. Otto, S. Huss-Lederman, D. Walker, J. Dongarra, MPI: The Complete Reference (1997) Google Scholar
  • 20. W. Weinert, H. Lopes, '‘GEPCLASS: a classification rule discovery tool using gene expression programming’' Lecture Notes in Computer Science (2006) Google Scholar
  • 21. W. Weinert, H. Lopes, '‘Evaluation of dynamic behavior forecasting parameters in the process of transition rule induction of unidimensional cellular automata’' Biosystems (2010) Google Scholar
  • 22. S.W. Wilson, J. Bacardit Ed., E. Bernadó-Mansilla Ed., M.V. Butz Ed., T. Kovacs Ed., X. Llorà Ed., K. Takadama Ed., '‘Classifier conditions using gene expression programming’' Learning Classifier Systems (2008) Google Scholar
  • 23. I. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementation (2005) Google Scholar
  • 24. N.A. Zakaria, H.M. Azamathulla, C.K. Chang, A.A. Ghani, '‘Gene expression programming for total bed material load estimation – a case study’' Science of the Total Environment (2010) Google Scholar
  • 25. () Google Scholar