Multiple sequence alignment based on dynamic weighted guidance tree
Abstract
Aligning multiple DNA/RNA/protein sequences to identify common functionalities, structures, or relationships between species is a fundamental task in bioinformatics. In this study, we propose a new multiple sequence strategy that extracts sequence information, sequence global and local similarities to provide different weights for each input sequence. A weighted pair-wise distance matrix is calculated from these sequences to build a dynamic alignment guiding tree. The tree can reorder its higher-level branches based on corresponding alignment results from lower tree levels to guarantee the highest alignment scores at each level of the tree. This technique improves the alignment accuracy up to 10% on many benchmarks tested against alignment tools such as CLUSTALW (Thompson et al., 1994), DIALIGN (Morgenstern, 1999), T-COFFEE (Notredame et al., 2000), MUSCLE (Edgar, 2004), and PROBCONS (Do et al., 2005) of the multiple sequence alignment.
Keywords
References
- 1. G. Blackshields, I. Wallace, M. Larkin, D. Higgins, '‘Analysis and comparison of benchmarks for multiple sequence alignment’' Silico Bio. (2006) Google Scholar
- 2. H. Carrillo, D. Lipman, '‘The multiple sequence alignment problem in biology’' SIAM J. Appl. Math. (1988) Google Scholar
- 3. M. Dayhoff, R. Schwartz, B. Orcutt, '‘A model of evolutionary change in proteins. Matrices for detecting distant relationships’' Atlas of Protein Sequence and Structure (1978) Google Scholar
- 4. C. Do, M. Mahabhashyam, M. Brudno, S. Batzoglou, '‘ProbCons: Probabilistic consistency-based multiple sequence alignment’' Genome Res. (2005) Google Scholar
- 5. R. Edgar, '‘MUSCLE: multiple sequence alignment with high accuracy and high throughput’' Nucleic Acids Research (2004) Google Scholar
- 6. D. Feng, R. Doolittle, '‘Progressive sequence alignment as a prerequisite to correct phylogenetic trees’' J. Mol. Evol. (1987) Google Scholar
- 7. S. Henikoff, J. Henikoff, '‘Amino acid substitution matrices from protein blocks’' Proceedings of the National Academy of Sciences (1992) Google Scholar
- 8. J. Hudak, M. McClure, '‘A comparative analysis of computational motif-detection methods’' (1999) Google Scholar
- 9. K. Katoh, K. Misawa, K. Kuma, T. Miyata, '‘MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform’' Nucleic Acids Research (2002) Google Scholar
- 10. D. Lipman, S. Altschul, J. Kececioglu, '‘A tool for multiple sequence alignment’' Proceedings of the National Academy of Sciences (1989) Google Scholar
- 11. B. Morgenstern, '‘DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment’' Bioinformatics (1999) Google Scholar
- 12. S. Needleman, C. Wunsch, '‘A general method applicable to the search for similarities in the amino acid sequence of two proteins’' J. Mol. Biol. (1970) Google Scholar
- 13. C. Notredame, D. Higgins, J. Heringa, '‘T-Coffee: A novel method for fast and accurate multiple sequence alignment’' J. Mol. Biol. (2000) Google Scholar
- 14. C. Notredame, D. Higgins, '‘SAGA: sequence alignment by genetic algorithm’' Nucleic Acids Research (1996) Google Scholar
- 15. N. Saitou, N. Nei, '‘The neighbor-joining method: a new method for reconstructing phylogenetic trees’' Journal of Mol. Biol. Evol. (1987) Google Scholar
- 16. R. Smith, T. Smith, '‘Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling’' Protein Engineering Design and Selection (1992) Google Scholar
- 17. T.F. Smith, M.S. Waterman, '‘Identification of common molecular subsequences’' Journal of Molecular Biology (1981) Google Scholar
- 18. P.H.A. Sneath, R.R. Sokal, Numerical Taxonomy (1973) Google Scholar
- 19. J. Stoye, '‘Multiple sequence alignment with the Divide-and-Conquer method’' Gene (1998) Google Scholar
- 20. J.D. Thompson, P. Koehl, R. Ripp, O. Poch, '‘BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark’' Protein (2005) Google Scholar
- 21. J.D. Thompson, F. Plewniak, O. Poch, '‘A comprehensive comparison of multiple sequence alignment programs’' Nucleic Acids Research (1999) Google Scholar
- 22. J. Thompson, D. Higgins, T. Gibson, '‘CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, Position-specific gap penalties and weight matrix choice’' Nucleic Acids Res. (1994) Google Scholar
- 23. L. Wang, T. Jiang, '‘On the complexity of multiple sequence alignment’' J. Comput. Biol. (1994) Google Scholar
- 24. T.J. Wheeler, J.D. Kececoiglu, '‘Multiple alignment by aligning alignments’' Bioinformatics (2007) Google Scholar


