Abstract
Many types of clustering techniques for chemical structures have been used in the literature, but it is known that any single method will not always give the best results for all types of applications. Recent work on consensus clustering methods is motivated because of the successes of combining multiple classifiers in many areas and the ability of consensus clustering to improve the robustness, novelty, consistency and stability of individual clusterings. In this paper, the Cluster-based Similarity Partitioning Algorithm (CSPA) was examined for improving the quality of chemical structures clustering. The effectiveness of clustering was evaluated based on the ability to separate active from inactive molecules in each cluster and the results were compared with the Ward’s clustering method. The chemical dataset MDL Drug Data Report (MDDR) database was used for experiments. The results, obtained by combining multiple clusterings, showed that the consensus clustering method can improve the robustness, novelty and stability of chemical structures clustering.
Keywords
References
- 1. (2010). ‘Ligand-based virtual screening using Bayesian networks’. J. Chem. Inf. Model. 50, 1012-1020 Google Scholar
- 2. (2011). ‘New fragment weighting scheme for the Bayesian inference network in ligand-based virtual screening’. J. Chem. Inf. Model. 51, 25-32 Google Scholar
- 3. (2012). ‘Ligand expansion in ligand-based virtual screening using relevance feedback’. Journal of Computer-Aided Molecular Design. 26, 279-287 Google Scholar
- 4. (1996). ‘Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection’. J. Chem. Inf. Comput. Sci. 36, 572-584 Google Scholar
- 5. (1998). ‘Chemoinformatics: what is it and how does it impact drug discovery’. Annu. Rep. Med. Chem. 33, 375-384 Google Scholar
- 6. (2010). ‘Combination rules for group fusion in similarity-based virtual screening’. Mol. Inf. 29, 533-541 Google Scholar
- 7. (1992). ‘Combining multiple classifications of chemical structures using consensus clustering’. Bioorganic & Medicinal Chemistry. 20, 18, 5366-5371 Google Scholar
- 8. (1992). ‘Clustering of chemical structures on the basis of two-dimensional similarity measures’. Journal of Chemical Information and Computer Science. 32, 644-649 Google Scholar
- 9. (2001). Cluster Analysis. 4th ed., London:Edward Arnold Google Scholar
- 10. (2005). ‘Combining multiple clustering using evidence accumulation’. IEEE Trans. Patt. Anal. Mach. Intell. 27, 835-850 Google Scholar
- 11. (1997).
‘Multilevel hypergraph partitioning: application in VLSI domain’.
DAC’97: Proc. 34th Ann. Conf. Design Automation.
New York, NY, USA , ACM, 526-529 Google Scholar - 12. (1998). ‘A fast and high quality multilevel scheme for partitioning irregular graphs’. SIAM J. Scient. Comput. 20, 359-392 Google Scholar
- 13. (2003). Analysis and Comparison of Molecular Similarity Measures. University of Sheffield, phD Thesis Google Scholar
- 14. (2003). ‘Combination of fingerprint-based similarity coefficients using data fusion’. J. Chem. Inf. Comput. Sci. 43, 435-442 Google Scholar
- 15. (2012).
The MDL Drug Data Report (MDDR) database.
(accessed on 1 November 2012) , Available online at: http://www.accelrys.com/ Google Scholar - 16. (2002). ‘Cluster ensembles – a knowledge reuse framework for combining multiple partitions’. J. Machine Learning Research. 3, 583-617 Google Scholar
- 17. (2004).
‘A mixture model of clustering ensembles’.
SIAM Int. Conf. Data Mining , 379-390 Google Scholar - 18. (2008). ‘3D pharmacophore hierarchical methods and 5-HT4 receptor binding data’. Enzyme Inhib.Med. Chem. 23, 593-603 Google Scholar
- 19. (2011). ‘A survey of clustering ensemble algorithms’. International Journal of Pattern Recognition and Artificial Intelligence. 25, 3, 337-372 Google Scholar
- 20. (2006). ‘Enhancing the effectiveness of ligand-based virtual screening using data fusion’. QSAR Comb. Sci. 25, 1143-1152 Google Scholar


