Abstract
We present NetLoc, a novel diffusion Kernel-based Logistic Regression (KLR) algorithm for predicting protein subcellular localisation using four types of protein networks including physical PPI networks, genetic Protein–Protein Interaction (PPI) networks, mixed PPI networks and co-expression networks. NetLoc is applied to yeast protein localisation prediction. The results showed that protein networks can provide rich information for protein localisation prediction, achieving Area Under Curve (AUC) score of 0.93. We also showed that networks with high connectivity and high percentage of co-localised PPI lead to better prediction performance. Investigation showed that NetLoc is a very robust approach which can produce good performance (AUC = 0.75) only using 30% of original interactions and capable of producing overall accuracy greater than 0.5 only with 20% annotation coverage. Compared to the previous network feature based prediction algorithm which achieved AUC scores of 0.49 and 0.52 on the yeast PPI network, NetLoc achieved significantly better overall performance with the AUC of 0.74.
Keywords
References
- 1. (2009). ‘MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction’. BMC Bioinformatics. 10, 274 Google Scholar
- 2. (2006). ‘Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains’. BMC Bioinformatics. 7, 298 Google Scholar
- 3. (2008). ‘The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation’. Brief Funct Genomic Proteomic. 7, 63-73 Google Scholar
- 4. (2004). ‘Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition’. J. Cell Biochem.. 91, 1197-1203 Google Scholar
- 5. (2000). ‘A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome’. J. Mol. Biol.. 301, 1059-1075 Google Scholar
- 6. (2007). ‘Locating proteins in the cell using targetp, signalp and related tools’. Nat Protoc. 2, 953-971 Google Scholar
- 7. (2000). ‘Predicting subcellular localization of proteins based on their N-terminal amino acid sequence’. J. Mol. Biol.. 300, 1005-1016 Google Scholar
- 8. (2006). ‘Methods for predicting bacterial protein subcellular localization’. Nat. Rev. Microbiol.. 4, 741-751 Google Scholar
- 9. (2006). ‘MPact: the MIPS protein interaction resource on yeast’. Nucleic Acids Res.. 34, D436-D441 Google Scholar
- 10. (2001). ‘Support vector machine approach for protein subcellular localization prediction’. Bioinformatics. 17, 721-728 Google Scholar
- 11. (2003). ‘Global analysis of protein localization in budding yeast’. Nature. 425, 686-691 Google Scholar
- 12. (2008). ‘Predicting subcellular localization with AdaBoost Learner’. Protein Pept. Lett.. 15, 286-289 Google Scholar
- 13. (2007). ‘ngLOC: an n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes’. Genome Biol.. 8, R68 Google Scholar
- 14. (2002). ‘Subcellular localization of the yeast proteome’. Genes Dev.. 16, 707-719 Google Scholar
- 15. (2006a). ‘Diffusion kernel-based logistic regression models for protein function prediction’. OMICS. 10, 40-55 Google Scholar
- 16. (2006b). ‘PLPD: reliable protein localization prediction from imbalanced and overlapped datasets’. Nucleic Acids Res. 34, 4655-4666 Google Scholar
- 17. (2008). ‘Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species’. Nucleic Acids Res.. 36, e136 Google Scholar
- 18. (2007). ‘Meta-prediction of protein subcellular localization with reduced voting’. Nucleic Acids Res.. 35, e96 Google Scholar
- 19. (2000). Molecular Cell Biology. New York Google Scholar
- 20. (2007). ‘Protein cellular localization prediction with support vector machines and decision trees’. Comput. Biol. Med.. 37, 115-125 Google Scholar
- 21. (2000). ‘Localizing proteins in the cell from their phylogenetic profiles’. Proc. Natl. Acad. Sci.. 97, USA, 12115-12120 Google Scholar
- 22. (2009). ‘Network-based prediction of metabolic enzymes’ subcellular localization’. Bioinformatics. 25, i247-i252 Google Scholar
- 23. (2002). ‘Predicting protein cellular localization using a domain projection method’. Genome Res.. 12, 1168-1174 Google Scholar
- 24. (1999). ‘PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization’. Trends Biochem. Sci.. 24, 34-36 Google Scholar
- 25. (2008). ‘Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization’. Amino Acids. 34, 653-660 Google Scholar
- 26. (2005). ‘Refining protein subcellular localization’. PLoS Comput. Biol.. 1, e66 Google Scholar
- 27. (2007). ‘Methodology development for predicting subcellular localization and other attributes of proteins’. Expert Rev Proteomics. 4, 453-463 Google Scholar
- 28. (2007). ‘Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition’. Amino Acids. 33, 69-74 Google Scholar
- 29. (1998). ‘Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization’. Mol Biol Cell. 9, 3273-3297 Google Scholar
- 30. (2007). ‘Current progress in network research: toward reference networks for key model organisms’. Brief Bioinform. 8, 318-332 Google Scholar
- 31. (2006). ‘BioGRID: a general repository for interaction datasets’. Nucleic Acids Res.. 34, D535-D539 Google Scholar
- 32. (2004). ‘Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations’. Nucleic Acids Res.. 32, W365-W671 Google Scholar
- 33. (2000). ‘DIP: the database of interacting proteins’. Nucleic Acids Res.. 28, 289-291 Google Scholar
- 34. (2006). ‘Prediction of protein subcellular localization’. Proteins. 64, 643-651 Google Scholar
- 35. (2004). ‘Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions’. Protein Sci.. 13, 1402-1406 Google Scholar
- 36. (2008). ‘Eukaryotic protein subcellular localization prediction based on sequence conservation and Protein–Protein Interaction’. Progress in Biochemistry and Biophysics. 35, 531-535 Google Scholar