July 26, 2014 | Home | What is Wiki | Adding or editing content | All documents | Disclaimer | My Lab
Recently viewed: Home > Protein domains and motifs
Document: Protein domains and motifs | Last modified: December 28, 2005
Protein Domains And Motifs
S Patnaik, Mar 2005

Sections

Consensus sites, domains, folds, motifs, patterns, profiles and repeats
Databases and links
Detecting known motifs and domains in your sequence of interest
Detecting unknown or specific patterns in your sequence of interest

Consensus sites, domains, folds, motifs, patterns, profiles and repeats

A consensus site usually refers to a position (usually conserved among homolous and orthologous sequences) that can theoretically get modified, for example, by phosphorylation or glycosylation. An asparagine followed by any amino acid follwed by a serine or threonine, for example, is a consensus site for N-linked glycosylation.

A domain is a discrete structural unit that is assumed to fold independently of the rest of the protein and to have its own function. It can be composed of 20 or so amino acid residues to up to hundreds of them. Domains are made up of multiple secondary structure units (alpha helices, beta sheets, etc.) Most proteins are multi-domain. Folds are the core 3-D structures of domains. It is believed that only a few thousand folds exist. A beta-barrel is an example of a fold.

Motifs are short, conserved regions and frequently are the most conserved regions of domains. Motifs are critical for the domain to function - in enzymes, for example, they may contain the active sites. Another example of motifs would be muclear localization sequences.

A pattern describes a short, contiguous stretch of protein using regular expressions. E.g., DX[DE]X is a pattern composed of amino acid D, followed by any, followed by either D or E, followed by any.

A profile is built by multiple sequence alignment, and is a matrix or table that describes the probability of finding a particular amino acid at a certain position. Mathematical means such as hidden Markov models are used to generate profiles.

A repeat is a stretch of amino acid sequence that gets repeated a number of times along the length of the sequence. There usually is some sequence variation between the repeated segments. Many domains are constituted from repeats.

Databases

Hundreds of thousands of protein sequences have been manually or automatically analyzed to generate databases of patterns, profiles, domains, etc. The PROSITE database contains patterns as well as profiles. The Fingerprints or PRINTS-S database contains clusters of patterns that define protein families. BLOCKS and ProDom databases are made of sequence fragments (like patterns) that are generated from sequence alignment and clustering. The Pfam and SMART databases contain hidden Markov model profiles.

Integrated databases such as InterPro and CDD seek to integrate some or all of above databases into a single resource.

Detecting KNOWN motifs and domains in your sequence of interest

To search (by keywords or sequence similarity) a particular database mentioned above, use these links -

BLOCKS - http://blocks.fhcrc.org/
Pfam - http://www.sanger.ac.uk/Software/Pfam/
ProDom - http://prodes.toulouse.inra.fr/prodom/doc/prodom.html
PRINTS-S - http://bioinf.man.ac.uk/dbbrowser/sprint/printss_lis.html
PROSITE - http://www.expasy.ch/prosite/
SMART - http://smart.embl-heidelberg.de/

To search an integrated database, use one of these links -

CDD - http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml
InterPro - http://www.ebi.ac.uk/interpro/
Uniprot - http://www.pir.uniprot.org/search/SearchTools.shtml best as it provides almost all the known information on the protein on one page

To search for consensus sites (phosphorylation, glycosylation, etc.), use -

ELM server - http://elm.eu.org/ (multiple)
CBS server - http://www.cbs.dtu.dk/services/ - kinase-specific phosphorylation site prediction, glycosylation prediction, sorting motifs, etc.

Identifying UNKNOWN or specific repeats, motifs and domains in your sequence of interest

This involves, first, collecting a group of sequences that are similar to yours. The set of sequences (or their alignment) is then analyzed for patterns. Some online servers are -

PRATT - for patterns - http://www.ebi.ac.uk/pratt/
RADAR - for repeats (using single sequence) - http://www.ebi.ac.uk/Radar/

To look for a pattern designated by you in a sequence, use the protein pattern find tool at http://bioinformatics.org/sms2/protein_pattern.html
∑ accuracy, clarity, cost, ease, logic | 74 wiki pages served since a while | Admin login