Adjusting scoring matrices to correct overextended alignments

Lauren J. Mills, William R. Pearson

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Motivation: Sequence similarity searches performed with BLAST, SSEARCH and FASTA achieve high sensitivity by using scoring matrices (e.g. BLOSUM62) that target low identity (533%) alignments. Although such scoring matrices can effectively identify distant homologs, they can also produce local alignments that extend beyond the homologous regions. Results: We measured local alignment start/stop boundary accuracy using a set of queries where the correct alignment boundaries were known, and found that 7%of BLASTP and 8% of SSEARCH alignment boundaries were overextended. Overextended alignments include non-homologous sequences; they occur most frequently between sequences that are more closely related (433% identity). Adjusting the scoring matrix to reflect the identity of the homologous sequence can correct higher identity overextended alignment boundaries. In addition, the scoring matrix that produced a correct alignment could be reliably predicted based on the sequence identity seen in the original BLOSUM62 alignment. Realigning with the predicted scoring matrix corrected 37% of all overextended alignments, resulting in more correct alignments than using BLOSUM62 alone.

Original languageEnglish (US)
Pages (from-to)3007-3013
Number of pages7
JournalBioinformatics
Volume29
Issue number23
DOIs
StatePublished - Dec 1 2013

Fingerprint

Dive into the research topics of 'Adjusting scoring matrices to correct overextended alignments'. Together they form a unique fingerprint.

Cite this