More Accurate Multiple Sequence Alignment
The 20 June 2008 issue of Science contains a paper by Art Löytynoja and Nick Goldman titled “Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis.” The authors tackled the difficult problem of how to handle correctly the placement of insertions and deletions in a multiple sequence alignment. A multiple sequence alignment is typically the input into phylogeny tools that attempt to determine the evolutionary relationship among the sequences. The misplacement of insertions and deletions in a multiple sequence alignment can result misinterpretations of the relationships among the sequences.
Typical multiple sequence alignment tools, such as CLUSTAL W, MUSCLE, MAFFT, and T-COFFEE, do not handle indels accurately. The authors developed new tools, PRANK and PRANK+F, that take into account the computed phylogeny of the sequences when placing insertions and deletions into the multiple alignment.
In this paper, the authors describe their refinements of the multiple alignment algorithm, and they provide theory and results that demonstrate that their algorithm improves the quality of multiple sequence alignments in a biologically meaningful way. The implications of their results are strongest for nucleotide sequence alignments, but the authors contend that their results are important also for peptide sequence alignments.
Source:
Löytnoja A, Goldman N. 2008. Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320:1632-1635. DOI: 10.1126/science.1158395.
Other bloggers who have have already commented on this paper include:
The Goldman group maintains a web page listing its publications; there are many other interesting papers given there.
June 29 2008 02:01 pm | Bioinformatics
