You are here:
Vanderbilt Biostatistics Wiki
>
Main Web
>
ProteinAlignmentAlgorithms
(04 May 2006,
JeremyStephens
)
(raw view)
E
dit
A
ttach
---+ Protein Alignment Algorithms %TOC% Here's some notes and links to protein alignment algorithms I've found. Here's the [[http://en.wikipedia.org/wiki/Sequence_alignment][Wikipedia page]] about sequence alignment. ---++ Terms Here are some terms that you might run across: Sequence Consensus: a sequence obtained from a multiple sequence alignment that represents the best makeup of the alignment. ---++ Multiple Alignment ---+++ Global Multiple Alignment These algorithms take several proteins and align them all globally. *[[http://www.ebi.ac.uk/clustalw/][CLUSTALW]]* *[[http://www.drive5.com/muscle/][MUSCLE]]* *[[http://www.biophys.kyoto-u.ac.jp/~katoh/programs/align/mafft/][MAFFT]]* * multiple "modes" of operation * can incorporate information from local pairwise alignments into the global alignment *[[http://bibiserv.techfak.uni-bielefeld.de/dialign/welcome.html][DIALIGN]]* * seems to be a hybrid of sorts; pieces together many local multiple alignments * could possibly be used to extract a single motif... *[[http://igs-server.cnrs-mrs.fr/~cnotred/Projects_home_page/t_coffee_home_page.html][T-COFFEE]]* *[[http://www.bioinformatics.ucla.edu/poa/POA_Online/Align.html][POA]]* *[[http://www.hku.hk/bruhk/gcgdoc/pileup.html][PileUp]]* And many others... <hr> ---+++ Local Multiple Alignment (Motif Discovery) There are two different kinds of motifs, gapped and ungapped. Apparently there are not nearly as many algorithms for gapped motifs as there are for ungapped. ---++++ Ungapped <div style="padding-left: 35px"> *[[http://meme.sdsc.edu/meme/][MEME]]* *[[http://blocks.fhcrc.org/][BlockMaker]]* *[[http://igs-server.cnrs-mrs.fr/~cnotred/Projects_home_page/t_coffee_home_page.html][T-COFFEE]] (MOCCA mode)* * to use in local alignment (motif discovery) mode, you have to already know the position of a motif *[[http://bayesweb.wadsworth.org/gibbs/gibbs.html][Gibbs]]* * [[http://www.proteinscience.org/cgi/reprint/4/8/1618.pdf][Gibbs motif sampling]] by Neuwald, Liu, and Lawrence (1995) * source code found [[ftp://ftp.ncbi.nlm.nih.gov/pub/neuwald/gibbs9_95/][here]] *[[http://web.mit.edu/bamel/gemoda/][Gemoda]]* *[[ftp://ftp.ncbi.nlm.nih.gov/pub/neuwald/asset/][ASSET]]* *[[http://blocks.fhcrc.org/blocks/][BLOCKS]]* </div> ---++++ Gapped Still looking... ---++++ Papers <div style="padding-left: 35px"> Here are some papers that may shed some light on the gapped motif problem. *[[http://www.springerlink.com/index/WJHQ8PY3R4NUYL2P.pdf][Emily Rocke's gapped motif research]]* * one of the more promising papers I've found *[[http://bioinformatics.oxfordjournals.org/cgi/content/full/22/1/21][Generic motif discovery algorithm]]* * mentions modifying [[http://web.mit.edu/bamel/gemoda/][Gemoda]] to discover gapped motifs *[[ftp://ftp.sdsc.edu/pub/sdsc/biology/ISMB00/058.pdf][Combinatorial Approaches to Finding Subtle Signals in DNA Sequences]]* * describes the SP-STAR algorithm by Pevzner and Sze used to find gapped motifs *[[http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=538276][Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model]]* * Andrew F. Neuwald and Jun S. Liu *[[http://nar.oxfordjournals.org/cgi/content/full/30/5/1268][BALSA: Bayesian algorithm for local sequence alignment]]* * Appears to be only a pair-wise alignment algorithm *A Bayesian insertion/deletion algorithm for distant protein motif searching via entropy filtering* by Jun Xie, et al. * Found a later paper by the same author (the one right below this one) that mentions a limitation of this algorithm. It can't find gapped-motifs with more than 1 consecutive gap. *[[http://scholar.google.com/url?sa=U&q=http://www.cs.uic.edu/~dasgupta/resume/publ/papers/liuxx.pdf][Identification of motifs with insertions and deletions in protein sequences using self-organizing...]]* * Uses a self-organizing neural network to discover motifs with at most 2 gaps * Looks promising *[[http://media.wiley.com/product_data/excerpt/94/04708482/0470848294.pdf][Bayesian Methods in Biological Sequence Analysis]]* </div> ---++ Pairwise Alignment Lots of stuff out there to do this. I'm only interested in finding motif discovery algorithms at present. ---++ Profiles Profiles (or position-specific scoring matrices) are constructed from multiple sequence alignments. These alignments can have gaps. Below are some of the programs than can build profiles and align sequences to profiles. ---+++ Builders <div style="padding-left: 35px"> *[[http://www.hku.hk/bruhk/gcgdoc/hmmerbuild.html][HmmerBuild]]* * builds hidden Markov model profiles from a multiple sequence alignment consensus *ProfileMake* </div> ---+++ Aligners <div style="padding-left: 35px"> *ProfileGap* </div> ---++ Application Suites There are a few packaged suites that include several tools related to protein sequence searching and alignment. <div style="padding-left: 35px"> *[[http://www.gcg.com][GCG]]* * contains over 140 programs to do this and that *[[http://hmmer.wustl.edu/][HMMER]]* * contains several programs that deal with HMM profiles </div> ---++ Unevaluated Here are the names of other algorithms that I haven't had a chance to look at yet. <div style="padding-left: 35px"> *[[http://www.cse.ucsc.edu/research/compbio/sam.html][SAM]]* *[[http://dna.stanford.edu/emotif][eMOTIF]]* * appears not to be working </div>
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r10
<
r9
<
r8
<
r7
|
B
acklinks
|
V
iew topic
|
Edit
w
iki text
|
M
ore topic actions
Topic revision: r10 - 04 May 2006,
JeremyStephens
Main
Department Home Page
Biostatistics Graduate Program
Vanderbilt University Medical Center
Main Web
Main Web Home
Search
Recent Changes
Changes
Topic list
Biostatistics Webs
Archive
Main
Sandbox
System
Register
|
Log In
Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki?
Send feedback