On the use of estimated tumour marker classifications in tumour diagnosis prediction – a case study for breast cancer
Abstract
In this article, we describe the use of tumour marker estimation models in the prediction of tumour diagnoses. In previous works, we have identified classification models for tumour markers that can be used for estimating tumour marker values on the basis of standard blood parameters. These virtual tumour markers are now used in combination with standard blood parameters for learning classifiers that are used for predicting tumour diagnoses. Several data-based modelling approaches implemented in HeuristicLab have been applied for identifying estimators for selected tumour markers and cancer diagnoses: Linear regression, k-nearest neighbour (k-NN) learning, artificial neural networks (ANNs) and support vector machines (SVMs) (all optimised using evolutionary algorithms), as well as genetic programming (GP). We have applied these modelling approaches for identifying models for breast cancer diagnoses; in the results section, we summarise classification accuracies for breast cancer and we compare classification results achieved by models that use measured marker values as well as models that use virtual tumour markers.
Keywords
References
- 1.
M. Affenzeller, S. Winkler, S. Wagner, A. Beham, ‘Genetic Algorithms and Genetic Programming – Modern Concepts and Practical Applications’, Chapman & Hall / CRC , ISBN 978–1584886297. (: 2009) Google ScholarA. Beham ‘Genetic Algorithms and Genetic Programming – Modern Concepts and Practical Applications’, Chapman & Hall / CRC , ISBN 978–1584886297.2009 - 2.
E. Alba, J. García-Nieto, L. Jourdan, E-G. Talbi, ‘Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms’, IEEE Congress on Evolutionary Computation 2007 , pp.284–290. (: 2005) Google ScholarE-G. Talbi ‘Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms’, IEEE Congress on Evolutionary Computation 2007 , pp.284–290.2005 - 3.
R. Ariew, Ockham’s Razor: A Historical and Philosophical Analysis of Ockham’s Principle of Parsimony (Champaign-Urbana: University of Illinois 1976) Google ScholarR. Ariew Ockham’s Razor: A Historical and Philosophical Analysis of Ockham’s Principle of Parsimony1976 - 4.
G. Brown, 'A new perspective for information theoretic feature selection' International Conference on Artificial Intelligence and Statistics 5 (2009): 49-56 Google ScholarG. Brown A new perspective for information theoretic feature selectionInternational Conference on Artificial Intelligence and Statistics200954956 - 5.
C-C. Chang, C-J. Lin, ‘LIBSVM: a library for support vector machines’, Software available at (: 2001) Google ScholarC-J. Lin ‘LIBSVM: a library for support vector machines’, Software available at2001 - 6.
H. Cheng, Z. Qin, C. Feng, Y. Wang, F. Li, 'Conditional mutual information-based feature selection analyzing for synergy and redundancy' Electronics and Telecommunications Research Institute (ETRI) Journal 33 2 (2011): 210-218 Google ScholarF. Li Conditional mutual information-based feature selection analyzing for synergy and redundancyElectronics and Telecommunications Research Institute (ETRI) Journal201133210218 - 7.
T.M. Cover, J.A. Thomas, Elements of Information Theory (New York: Wiley-Interscience 1991) Google ScholarJ.A. Thomas Elements of Information Theory1991 - 8.
S. Droste, ‘Genetic programming with guaranteed quality’, Genetic Programming 1998: Proceedings of the Third Annual Conference , Morgan Kaufmann, pp.54–59. (: 1998) Google ScholarS. Droste ‘Genetic programming with guaranteed quality’, Genetic Programming 1998: Proceedings of the Third Annual Conference , Morgan Kaufmann, pp.54–59.1998 - 9.
R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification 2nd Ed. (New York, Chichester, Weinheim, Brisbane, Singapore, Toronto: John Wiley & Sons, Inc. 2000) Google ScholarD.G. Stork Pattern Classification 2nd Ed.2000 - 10.
M.A. Efroymson, ‘Multiple regression analysis’, In Ralston, A. and Wilf, H.S. (Eds.), Mathematical Methods for Digital Computers (New York: Wiley 1960): 191-203 Google ScholarM.A. Efroymson ‘Multiple regression analysis’, In Ralston, A. and Wilf, H.S. (Eds.), Mathematical Methods for Digital Computers1960191203 - 11.
A.E. Eiben, J.E. Smith, ‘Introduction to evolutionary computation’, Natural Computing Series (Berlin Heidelberg: Springer-Verlag 2003) Google ScholarJ.E. Smith ‘Introduction to evolutionary computation’, Natural Computing Series2003 - 12.
A. El Akadi, A. El Ouardighi, D. Aboutajdine, ‘A powerful feature selection approach based on mutual information’, International Journal of Computer Science and Network Security , Vol. 8, No. 4, p.116. (: 2008) Google ScholarD. Aboutajdine ‘A powerful feature selection approach based on mutual information’, International Journal of Computer Science and Network Security , Vol. 8, No. 4, p.116.2008 - 13.
U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, 'From data mining to knowledge discovery in databases' Al Magazine 17 3 (1996): 37-54 Google ScholarP. Smyth From data mining to knowledge discovery in databasesAl Magazine1996173754 - 14.
F. Fleuret, 'Fast binary feature selection with conditional mutual information' Journal of Machine Learning Research 5 (2004): 1531-1555 Google ScholarF. Fleuret Fast binary feature selection with conditional mutual informationJournal of Machine Learning Research2004515311555 - 15.
P. Gold, S.O. Freedman, 'Demonstration of tumor-specific antigens in human colonic carcinomata by immunological tolerance and absorption techniques' The Journal of Experimental Medicine 121 (1965): 439-462 Google ScholarS.O. Freedman Demonstration of tumor-specific antigens in human colonic carcinomata by immunological tolerance and absorption techniquesThe Journal of Experimental Medicine1965121439462 - 16.
I. Guyon, S. Gunn, M. Nikravesh, L.A. Zadeh, ‘Feature Extraction: Foundations and Applications’, Studies in Fuzziness & Soft Computing , Springer, ISBN 3–540-35487–5. (: 2006) Google ScholarL.A. Zadeh ‘Feature Extraction: Foundations and Applications’, Studies in Fuzziness & Soft Computing , Springer, ISBN 3–540-35487–5.2006 - 17.
S. Hammarstrom, 'The carcinoembryonic antigen (CEA) family: structures, suggested functions and expression in normal and malignant tissues' Seminars in Cancer Biology 9 (1999): 67-81 Google ScholarS. Hammarstrom The carcinoembryonic antigen (CEA) family: structures, suggested functions and expression in normal and malignant tissuesSeminars in Cancer Biology199996781 - 18.
J.H. Holland, Adaption in Natural and Artificial Systems (1975) Google ScholarJ.H. Holland Adaption in Natural and Artificial Systems1975 - 19.
A. Keshaviah, S. Dellapasqua, N. Rotmensz, J. Lindtner, D. Crivellari, 'CA15–3 and alkaline phosphatase as predictors for breast cancer recurrence: a combined analysis of seven International Breast Cancer Study Group trials' Annals of Oncology 18 4 (2007): 701-708 Google ScholarD. Crivellari CA15–3 and alkaline phosphatase as predictors for breast cancer recurrence: a combined analysis of seven International Breast Cancer Study Group trialsAnnals of Oncology200718701708 - 20.
J.A. Koepke, 'Molecular marker test standardization' Cancer 69 (1992): 1578-1581 Google ScholarJ.A. Koepke Molecular marker test standardizationCancer19926915781581 - 21.
J. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (Cambridge, MA, USA: The MIT Press 1992) Google ScholarJ. Koza Genetic Programming: On the Programming of Computers by Means of Natural Selection1992 - 22.
G. Kronberger, Symbolic Regression for Knowledge Discovery – Bloat, Overfitting and Variable Interaction Networks PhD Thesis, Johannes Kepler University Linz, Austria (: 2010) Google ScholarG. Kronberger Symbolic Regression for Knowledge Discovery – Bloat, Overfitting and Variable Interaction Networks PhD Thesis, Johannes Kepler University Linz, Austria2010 - 23.
W.B. Langdon, R. Poli, Foundations of Genetic Programming (Berlin Heidelberg New York: Springer Verlag 2002) Google ScholarR. Poli Foundations of Genetic Programming2002 - 24.
L. Ljung, System Identification – Theory For The User 2nd edition (Upper Saddle River, NJ, USA: PTR Prentice Hall 1999) Google ScholarL. Ljung System Identification – Theory For The User 2nd edition1999 - 25.
P. Meyer, G. Bontempi, ‘On the use of variable complementarity for feature selection in cancer classification’, in Applications of Evolutionary Computing , Lecture Notes in Computer Science, Springer Berlin Heidelberg, Vol. 3907, pp. 91–102. (: 2006) Google ScholarG. Bontempi ‘On the use of variable complementarity for feature selection in cancer classification’, in Applications of Evolutionary Computing , Lecture Notes in Computer Science, Springer Berlin Heidelberg, Vol. 3907, pp. 91–102.2006 - 26.
T. Mitchell, Machine Learning (Boston, Burr Ridge, Dubuque, Madison, New York, San Francisco, St. Louis: McGraw Hill 1997) Google ScholarT. Mitchell Machine Learning1997 - 27.
O. Nelles, Nonlinear System Identification (Berlin Heidelberg New York: Springer Verlag 2001) Google ScholarO. Nelles Nonlinear System Identification2001 - 28.
Y. Niv, 'MUC1 and colorectal cancer pathophysiology considerations' World Journal of Gastroenterology 14 (2008): 2139-2141 Google ScholarY. Niv MUC1 and colorectal cancer pathophysiology considerationsWorld Journal of Gastroenterology20081421392141 - 29.
N. Osman, N. O’Leary, E. Mulcahy, N. Barrett, F. Wallis, K. Hickey, R. Gupta, 'Correlation of serum CA125 with stage, grade and survival of patients with epithelial ovarian cancer at a single centre' Irish Medical Journal 101 (2008): 245-247 Google ScholarR. Gupta Correlation of serum CA125 with stage, grade and survival of patients with epithelial ovarian cancer at a single centreIrish Medical Journal2008101245247 - 30.
I. Rechenberg, ‘Evolutions strategie’, Friedrich Frommann Verlag , Stuttgart. (: 1973) Google ScholarI. Rechenberg ‘Evolutions strategie’, Friedrich Frommann Verlag , Stuttgart.1973 - 31.
D.G. Rosen, L. Wang, J.N. Atkinson, Y. Yu, K.H. Lu, E.P. Diamandis, I. Hellstrom, S.C. Mok, J. Liu, R.C. Bast, 'Potential markers that complement expression of CA125 in epithelial ovarian cancer' Gynecologic Oncology 99 2 (2005): 267-277 Google ScholarR.C. Bast Potential markers that complement expression of CA125 in epithelial ovarian cancerGynecologic Oncology200599267277 - 32.
H-P. Schwefel, Numerische Optimierung von Computer-Modellen mittels der Evolutionsstrategie (Basel: Birkhäuser Verlag 1994) Google ScholarH-P. Schwefel Numerische Optimierung von Computer-Modellen mittels der Evolutionsstrategie1994 - 33.
C.E. Shannon, 'A mathematical theory of communication' Bell Systems Technical Journal 27 3 (1948): 379-423 Google ScholarC.E. Shannon A mathematical theory of communicationBell Systems Technical Journal194827379423 - 34.
M. Tesmer, P.A. Estévez, ‘AMIFS: adaptive feature selection by using mutual information’, IEEE International Joint Conference on Neural Networks , Budapest, Hungary, 26–29 July, Vol. 1, pp.303–308. (: 2004) Google ScholarP.A. Estévez ‘AMIFS: adaptive feature selection by using mutual information’, IEEE International Joint Conference on Neural Networks , Budapest, Hungary, 26–29 July, Vol. 1, pp.303–308.2004 - 35.
V. Vapnik, Statistical Learning Theory (New York, Chichester, Weinheim, Brisbane, Singapore, Toronto: John Wiley & Sons Inc. 1998) Google ScholarV. Vapnik Statistical Learning Theory1998 - 36.
S. Wagner, Heuristic Optimization Software Systems – Modeling of Heuristic Optimization Algorithms in the HeuristicLab Software Environment, PhD Thesis, Institute for Formal Models and Verification, Johannes Kepler University Linz, Austria (: 2009) Google ScholarS. Wagner Heuristic Optimization Software Systems – Modeling of Heuristic Optimization Algorithms in the HeuristicLab Software Environment, PhD Thesis, Institute for Formal Models and Verification, Johannes Kepler University Linz, Austria2009 - 37.
S. Wagner, M. Affenzeller, ‘SexualGA: Gender-Specific Selection for Genetic Algorithms’, Proceedings of the 9th World Multi-Conference on Systemics, Cybernetics and Informatics 2005 , pp.76–81. (: 2005) Google ScholarM. Affenzeller ‘SexualGA: Gender-Specific Selection for Genetic Algorithms’, Proceedings of the 9th World Multi-Conference on Systemics, Cybernetics and Informatics 2005 , pp.76–81.2005 - 38.
S. Winkler, ‘Evolutionary system identification – modern concepts and practical applications’, Schriften der Johannes Kepler Universität Linz, Reihe C: Technik and Naturwissenschaften, Universitätsverlag Rudolf Trauner , ISBN 978–3-85499–569-2. (: 2009) Google ScholarS. Winkler ‘Evolutionary system identification – modern concepts and practical applications’, Schriften der Johannes Kepler Universität Linz, Reihe C: Technik and Naturwissenschaften, Universitätsverlag Rudolf Trauner , ISBN 978–3-85499–569-2.2009 - 39.
S. Winkler, M. Affenzeller, W. Jacak, H. Stekel, ‘Classification of tumor marker values using heuristic data mining methods’, Proceedings of Genetic and Evolutionary Computation Conference 2010, Workshop on Medical Applications of Genetic and Evolutionary Computation , pp.1915–1922. (: 2010) Google ScholarH. Stekel ‘Classification of tumor marker values using heuristic data mining methods’, Proceedings of Genetic and Evolutionary Computation Conference 2010, Workshop on Medical Applications of Genetic and Evolutionary Computation , pp.1915–1922.2010 - 40.
S. Winkler, M. Affenzeller, W. Jacak, H. Stekel, ‘Identification of cancer diagnosis estimation models using evolutionary algorithms – a case study for breast cancer, melanoma and cancer in the respiratory system’, 13th Annual Genetic and Evolutionary Computation Conference, GECCO 2011, Companion Material Proceedings , Dublin, Ireland, 12–16 July, ACM 2011, ISBN 978–1-4503–0690-4, pp.503–510. (: 2011) Google ScholarH. Stekel ‘Identification of cancer diagnosis estimation models using evolutionary algorithms – a case study for breast cancer, melanoma and cancer in the respiratory system’, 13th Annual Genetic and Evolutionary Computation Conference, GECCO 2011, Companion Material Proceedings , Dublin, Ireland, 12–16 July, ACM 2011, ISBN 978–1-4503–0690-4, pp.503–510.2011 - 41.
B.W. Yin, A. Dnistrian, K.O. Lloyd, 'Ovarian cancer antigen CA125 is encoded by the MUC16 mucin gene' International Journal of Cancer (: 2002): 737-740 Google ScholarK.O. Lloyd Ovarian cancer antigen CA125 is encoded by the MUC16 mucin geneInternational Journal of Cancer200298737740 - 42.
K. Yonemori, m. Ando, T.S. Taro, N. Katsumata, K. Matsumoto, Y. Yamanaka, T. Kouno, C. Shimizu, Y. Fujiwara, 'Tumor-marker analysis and verification of prognostic models in patients with cancer of unknown primary, receiving platinum-based combination chemotherapy' Journal of Cancer Research and Clinical Oncology 132 10 (2006): 635-642 Google ScholarY. Fujiwara Tumor-marker analysis and verification of prognostic models in patients with cancer of unknown primary, receiving platinum-based combination chemotherapyJournal of Cancer Research and Clinical Oncology2006132635642 - 43.
HeuristicLab website: (: )HeuristicLab website:0 Google Scholar - 44.
HEAL research group website: (: )HEAL research group website:0 Google Scholar - 45.
Heureka! website: (: )Heureka! website:0 Google Scholar