Skip to main content
No Access

A novel strategy for molecular signature discovery based on independent component analysis

Published Online:pp 277-304

Microarray analysis often leads to either too large or too small numbers of gene candidates to allow meaningful identification of functional signatures. We aimed at overcoming this hurdle by combining two algorithms:

  • Independent Component Analysis to extract statistically-based potential signatures.
  • Gene Set Enrichment Analysis to produce a score of enrichment with statistical significance of each potential signature.

We have applied this strategy to identify regulatory T cell (Treg) molecular signatures from two experiments in mice, with cross-validation. These signatures can detect the ~1% Treg in whole spleen. These findings demonstrate the relevance of our approach as a signature discovery tool.

Keywords

data mining, bioinformatics, statistical modelling, transcriptome, gene expression, microarray data analysis, GSEA, gene set enrichment analysis, ICA, independent component analysis, T lymphocyte, regulatory T cell, Treg, molecular signature, signature discovery

References

  • 1. Ashburner, M. , Ball, C.A. , Blake, J.A. , Botstein, D. , Butler, H. , Cherry, J.M. , Davis, A.P. , Dolinski, K. , Dwight, S.S. , Eppig, J.T. , Harris, M.A. , Hill, D.P. , Issel-Tarver, L. , Kasarskis, A. , Lewis, S. , Matese, J.C. , Richardson, J.E. , Ringwald, M. , Rubin, G.M. , Sherlock, G. (2000). ‘Gene ontology: tool for the unifi cation of biology’. Nature Genetics. 25, 1, 25-29 Google Scholar
  • 2. Benjamini, Y. , Hochberg, Y. (1995). ‘Controlling the false discovery rate: a practical and powerful approach to multiple testing’. Journal of the Royal Statistical Society, Series B (Statistical Methodology). 57, 1, 289-300 Google Scholar
  • 3. Cao, X. (2009). ‘New DNA-sensing pathway feeds RIG-I with RNA’. Nature Immunology. 10, 10, 1049-1051 Google Scholar
  • 4. Chaussabel, D. , Quinn, C. , Shen, J. , Patel, P. , Glaser, C. , Baldwin, N. , Stichweh, D. , Blankenship, D. , Li, L. , Munagala, I. , Bennett, L. , Allantaz, F. , Mejias, A. , Ardura, M. , Kaizer, E. , Monnet, L. , Allman, W. , Randall, H. , Johnson, D. , Lanier, A. , Punaro, M. , Wittkowski, K.M. , White, P. , Fay, J. , Klintmalm, G. , Ramilo, O. , Palucka, A.K. , Banchereau, J. , Pascual, V. (2008). ‘A modular analysis framework for blood genomics studies: application to systemic lupus erythematosus’. Immunity. 29, 1, 150-164 Google Scholar
  • 5. Chen, L. , Xuan, J. , Wang, C. , Wang, Y. , Shih, I. , Wang, T.L. , Zhang, Z. , Clarke, R. , Hoffman, E.P. (2009). ‘Biomarker identifi cation by knowledge-driven multilevel ICA and motif analysis’. International Journal of Data Mining and Bioinformatics. 3, 4, 365-381 AbstractGoogle Scholar
  • 6. Chiappetta, P. , Roubaud, M.C. , Torresani, B. (2004). ‘Blind source separation and the analysis of microarray data’. Journal of Computational Biology. 11, 6, 1090-1109 Google Scholar
  • 7. Comon, P. (1994). ‘Independent component analysis, a new concept?’. Signal Processing. 36, 287-314 Google Scholar
  • 8. de la Fuente, H. , Lamana, A. , Mittelbrunn, M. , Perez-Gala, S. , Gonzalez, S. , Garcia-Diez, A. , Vega, M. , Sanchez-Madrid, F. (2009). ‘Identification of genes responsive to solar simulated UV radiation in human monocyte-derived dendritic cells’. PLoS ONE. 4, 8, e6735 Google Scholar
  • 9. Desjardins, D. , Huret, C. , Dalba, C. , Kreppel, F. , Kochanek, S. , Cosset, F.L. , Tangy, F. , Klatzmann, D. , Bellier, B. (2009). ‘Recombinant retrovirus-like particle forming DNA vaccines in primeboost immunization and their use for hepatitis C virus vaccine development’. Journal of Gene Medicine. 11, 4, 313-325 Google Scholar
  • 10. Dudoit, S. , Shaffer, J.P. , Boldrick, J.C. (2003). ‘Multiple hypothesis testing in microarray experiments’. Statistical Science. 18, 1, 71-103 Google Scholar
  • 11. Fontenot, J.D. , Rasmussen, J.P. , Gavin, M.A. , Rudensky, A.Y. (2005). ‘A function for interleukin 2 in Foxp3-expressing regulatory T cells’. Nature Immunology. 6, 11, 1142-1151 Google Scholar
  • 12. Fury, W. , Batliwalla, F. , Gregersen, P.K. , Li, W. (2006). ‘Overlapping probabilities of top ranking gene lists, hypergeometric distribution, and stringency of gene selection criterion’. Conference Proceedings IEEE Engineering in Medicine and Biology Society. 1, 5531-5534 Google Scholar
  • 13. Haining, W.N. , Ebert, B.L. , Subramanian, A. , Wherry, E.J. , Eichbaum, Q. , Evans, J.W. , Mak, R. , Rivoli, S. , Pretz, J. , Angelosanto, J. , Smutko, J.S. , Walker, B.D. , Kaech, S.M. , Ahmed, R. , Nadler, L.M. , Golub, T.R. (2008). ‘Identification of an evolutionarily conserved transcriptional signature of CD8 memory differentiation that is shared by T and B cells’. Journal of Immunology. 181, 3, 1859-1868 Google Scholar
  • 14. Hartman, Z.C. , Black, E.P. , Amalfitano, A. (2007). ‘Adenoviral infection induces a multi-faceted innate cellular immune response that is mediated by the toll-like receptor pathway in A549 cells’. Virology. 358, 2, 357-372 Google Scholar
  • 15. Hartman, Z.C. , Kiang, A. , Everett, R.S. , Serra, D. , Yang, X.Y. , Clay, T.M. , Amalfitano, A. (2007). ‘Adenovirus infection triggers a rapid, MyD88-regulated transcriptome response critical to acutephase and adaptive immune responses in vivo’. Journal of Virology. 81, 4, 1796-1812 Google Scholar
  • 16. Hyvärinen, A. , Oja, E. (2000). ‘Independent component analysis: algorithms and applications’. Neural Networks. 13, 4–5, 411-430 Google Scholar
  • 17. James, C.J. , Hesse, C.W. (2005). ‘Independent component analysis for biomedical signals’. Physiological Measurement. 26, 1, R15-R39 Google Scholar
  • 18. Kanehisa, M. , Goto, S. (2000). ‘KEGG: Kyoto encyclopedia of genes and genomes’. Nucleic Acids Research. 28, 1, 27-30 Google Scholar
  • 19. Kong, W. , Vanderburg, C.R. , Gunshin, H. , Rogers, J.T. , Huang, X. (2008). ‘A review of independent component analysis application to microarray gene expression data’. BioTechniques. 45, 5, , 520501- Google Scholar
  • 20. Kossenkov, A.V. , Ochs, M.F. (2010). ‘Matrix factorisation methods applied in microarray data analysis’. International Journal of Data Mining and Bioinformatics. 4, 1, 72-90 AbstractGoogle Scholar
  • 21. Lin, W. , Haribhai, D. , Relland, L. , Truong, N. , Carlson, M. , Williams, C. , Chatila, T. (2007). ‘Regulatory T cell development in the absence of functional Foxp3’. Nature Immunology. 8, 4, 359-368 Google Scholar
  • 22. Liu, Z. , Chen, D. , Xu, Y. , Liu, J. (2005). ‘Logistic support vector machines and their application to gene expression data’. International Journal of Bioinformatics Research and Applications. 1, 2, 169-182 AbstractGoogle Scholar
  • 23. Lutter, D. , Ugocsai, P. , Grandl, M. , Orso, E. , Theis, F. , Lang, E.W. , Schmitz, G. (2008). ‘Analyzing M-CSF dependent monocyte/macrophage differentiation: expression modes and meta-modes derived from an independent component analysis’. BMC Bioinformatics. 9, 100 Google Scholar
  • 24. Mootha, V.K. , Lindgren, C.M. , Eriksson, K.F. , Subramanian, A. , Sihag, S. , Lehar, J. , Puigserver, P. , Carlsson, E. , Ridderstrale, M. , Laurila, E. , Houstis, N. , Daly, M.J. , Patterson, N. , Mesirov, J.P. , Golub, T.R. , Tamayo, P. , Spiegelman, B. , Lander, E.S. , Hirschhorn, J.N. , Altshuler, D. , Groop, L.C. (2003). ‘PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes’. Nature Genetics. 34, 3, 267-273 Google Scholar
  • 25. Murohashi, M. , Hinohara, K. , Kuroda, M. , Isagawa, T. , Tsuji, S. , Kobayashi, S. , Umezawa, K. , Tojo, A. , Aburatani, H. , Gotoh, N. (2010). ‘Gene set enrichment analysis provides insight into novel signalling pathways in breast cancer stem cells’. British Journal of Cancer. 102, 1, 206-212 Google Scholar
  • 26. Oldham, M.C. , Konopka, G. , Iwamoto, K. , Langfelder, P. , Kato, T. , Horvath, S. , Geschwind, D.H (2008). ‘Functional organization of the transcriptome in human brain’. Nature Neuroscience. 11, 11, 1271-1282 Google Scholar
  • 27. Pfoertner, S. , Jeron, A. , Probst-Kepper, M. , Guzman, C.A. , Hansen, W. , Westendorf, A.M. , Toepfer, T. , Schrader, A.J. , Franzke, A. , Buer, J. , Geffers, R. (2006). ‘Signatures of human regulatory T cells: an encounter with old friends and new players’. Genome Biology. 7, 7, R54 Google Scholar
  • 28. Sakaguchi, S. , Wing, K. , Miyara, M. (2007). ‘Regulatory T cells – a brief history and perspective’. European Journal of Immunology. 37, S1, 116-123 Google Scholar
  • 29. Schmid, R. , Baum, P. , Ittrich, C. , Fundel-Clemens, K. , Huber, W. , Brors, B. , Eils, R. , Weith, A. , Mennerich, D. , Quast, K. (2010). ‘Comparison of normalization methods for Illumina BeadChip(R) HumanHT-12 v3’. BMC Genomics. 11, 1, 349 Google Scholar
  • 30. Subramanian, A. , Tamayo, P. , Mootha, V.K. , Mukherjee, S. , Ebert, B.L. , Gillette, M.A. , Paulovich, A. , Pomeroy, S.L. , Golub, T.R. , Lander, E.S. , Mesirov, J.P. (2005). ‘Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles’. Proceedings of the National Academy of Sciences of the United States of America. 102, 43, 15545-15550 Google Scholar
  • 31. Sugimoto, N. , Oida, T. , Hirota, K. , Nakamura, K. , Nomura, T. , Uchiyama, T. , Sakaguchi, S. (2006). ‘Foxp3-dependent and -independent molecules specifi c for CD25+CD4+ natural regulatory T cells revealed by DNA microarray analysis’. International Immunology. 18, 8, 1197-1209 Google Scholar
  • 32. Tseng, V.S. , Yu, H-H. (2011). ‘Microarray data classifi cation by multi-information based gene scoring integrated with Gene Ontology’. International Journal of Data Mining and Bioinformatics. 5, 4, 402-416 AbstractGoogle Scholar