Protein Alignment Algorithms

Here's some notes and links to protein alignment algorithms I've found. Here's the Wikipedia page about sequence alignment.

Terms

Here are some terms that you might run across:

Sequence Consensus
a sequence obtained from a multiple sequence alignment that represents the best makeup of the alignment.

Multiple Alignment

Global Multiple Alignment

These algorithms take several proteins and align them all globally.

CLUSTALW

MUSCLE

MAFFT
  • multiple "modes" of operation
  • can incorporate information from local pairwise alignments into the global alignment

DIALIGN
  • seems to be a hybrid of sorts; pieces together many local multiple alignments
  • could possibly be used to extract a single motif...

T-COFFEE

POA

PileUp

And many others...

Local Multiple Alignment (Motif Discovery)

There are two different kinds of motifs, gapped and ungapped. Apparently there are not nearly as many algorithms for gapped motifs as there are for ungapped.

Ungapped

MEME

BlockMaker

T-COFFEE (MOCCA mode)
  • to use in local alignment (motif discovery) mode, you have to already know the position of a motif

Gibbs

Gemoda

ASSET

BLOCKS

Gapped

Still looking...

Papers

Here are some papers that may shed some light on the gapped motif problem.

Emily Rocke's gapped motif research
  • one of the more promising papers I've found

Generic motif discovery algorithm
  • mentions modifying Gemoda to discover gapped motifs

Combinatorial Approaches to Finding Subtle Signals in DNA Sequences
  • describes the SP-STAR algorithm by Pevzner and Sze used to find gapped motifs

Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model
  • Andrew F. Neuwald and Jun S. Liu

BALSA: Bayesian algorithm for local sequence alignment
  • Appears to be only a pair-wise alignment algorithm

A Bayesian insertion/deletion algorithm for distant protein motif searching via entropy filtering by Jun Xie, et al.
  • Found a later paper by the same author (the one right below this one) that mentions a limitation of this algorithm. It can't find gapped-motifs with more than 1 consecutive gap.

Identification of motifs with insertions and deletions in protein sequences using self-organizing...
  • Uses a self-organizing neural network to discover motifs with at most 2 gaps
  • Looks promising

Bayesian Methods in Biological Sequence Analysis

Pairwise Alignment

Lots of stuff out there to do this. I'm only interested in finding motif discovery algorithms at present.

Profiles

Profiles (or position-specific scoring matrices) are constructed from multiple sequence alignments. These alignments can have gaps. Below are some of the programs than can build profiles and align sequences to profiles.

Builders

HmmerBuild
  • builds hidden Markov model profiles from a multiple sequence alignment consensus

ProfileMake

Aligners

ProfileGap

Application Suites

There are a few packaged suites that include several tools related to protein sequence searching and alignment.

GCG
  • contains over 140 programs to do this and that

HMMER
  • contains several programs that deal with HMM profiles

Unevaluated

Here are the names of other algorithms that I haven't had a chance to look at yet.

SAM

eMOTIF
  • appears not to be working
Topic revision: r10 - 04 May 2006, JeremyStephens
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback