1.03k likes | 2.15k Views
DNA Motif and protein domain discovery. Presented by: Deeter Neumann Peter St. Andre. PDB; human enhancer binding protein. PDB; zinc finger 224. Outline. What are DNA motifs & proteins domains? Their importance and function motif algorithms locating domain/motif experimentally
E N D
DNAMotif and protein domain discovery Presented by: Deeter Neumann Peter St. Andre PDB; human enhancer binding protein PDB; zinc finger 224
Outline What are DNA motifs & proteins domains? Their importance and function motif algorithms locating domain/motif experimentally available programs: PFAM & SMART Taken fromwikimedia.org
What are DNAsequence motifs? “Sequence motifs are short recurring patterns in DNA that are presumed to have biological function.” D’haeseleer, P. Nature Biotechnology24, 423 - 425 (2006). Image taken from bio.miami.edu
Why are DNA sequence motifs important to know? Indicates common structural protein domains Identifies similar function Other possible biological functions, eg. transcription factors, mRNA processing
What is the function of DNA domains? specific and non-specific interactions permits binding of transcription factor to target gene sequence-specific recognition Human Molecular Genetics 3; Strachan & Read
What are protein domains? Protein sequences and structures that evolve, function, and exist independently from the rest of the protein They often form functional units, like metal binding domains Image of human zinc finger domain Taken from .ionchannels.org
Why are Proteins Domains Important? Bind to other molecules in the cell Signal transduction pathways Genetically engineering novel proteins Pharmaceutical importance 7
Algorithmic Approaches for both DNA motifs and protein domain searches Three general approaches are used: Enumeration Deterministic optimization Probabilistic optimization
Enumeration Employs the broadest approach Looks at all possible motifs Few limitations are enacted on it
Enumeration, cont. Key point: Covers all possible sequence motifs with few limitations Pros: Does not get stuck in local optimum Cons: May overlook subtle patterns Programs like WeederWeb and YMF use these type of algorithms
Deterministic optimization Takes into account an Expectation Maximization model and a position weight matrix MEME is one program that uses this approach What does this mean?
Deterministic optimization, cont. Taken from ws.nbcr.net/app1234127263839/meme.html
Probabilistic optimization Uses a Gibbs sampling approach • Randomized implementation of expectation maximization model How is this applied?
Probabilistic optimization, cont. Selects random sites and each is weighted against known motifs Allows program to add or remove sequences and continuously update motifs
Which one to use? Recent research showed that enumeration approaches worked very well Generally accepted that no one approach is the best Programs that incorporate several approaches work the best Important to rerun programs
Examples of programs WeederWeb is a web-based interface with an enumerative approach YMF is another enumerative program MEME is an online program that uses a deterministic optimization approach MotifSampler is a program that combines Gibbs sampling and a third order Markov model
Measurements used to score sequence motifs Three main statistics used: Information content Log likelihood MAP score
Other measures of motif quality Group specificity, or site specificity • Probability of having a certain number of target sequences with the site in question Sequence specificity • Accounts for both number of sequences with the sites in question and the number of sites per sequence Positional bias, or uniformity • Looks at how uniform of the sites in question are distribute with respect to transcription start sites of the gene
Identification and preliminary characterization of a protein motif related to the zinc finger Lovering et al. (1993)
What is a zinc finger? autonomously folding domain structural motif zinc required for folding and DNA interactions PDB; single zinc finger in solution part of protein that is used to regulate DNA
Classic zinc finger conserved cysteines and histidines binds with zinc Tetrahedral structure antiparallel two-stranded β-sheets and an α-helix image from wikipedia
Figure 1A Lovering et al.
Actual RING1 sequence MTTPANAQNASKTWELSLYELHRTPQEAIMDGTEIAVSPRSLHSELMCPICLDMLKNTMTTKECLHRFCSDCIVTALRSGNKECPTCRKKLVSKRSLRPDPNFDALISKIYPSREEYEAHQDRVLIRLSRLHNQQALSSSIEEGLRMQAMHRAQRVRRPIPGSDQTTTMSGGEGEPGEGEGDGEDVSSDSAPDSAPGPAPKRPRGGGAGGSSVGTGGGGTGGVGGGAGSEDSGDRGGTLGGGTLGPPSPPGAPSPPEPGGEIELVFRPHPLLVEKGEYCQTRYVKTTGNATVDHLSKYLALRIALERRQQQEAGEPGGPGGGASDTGGPDGCGGEGGGAGGGDGPEEPALPSLEGVSEKQYTIYIAPGGGAFTTLNGSLTLELVNEKFWKVSRPLELCYAPTKDPK
RING finger Cys1-Xaa-hydrophobic aa-Cys2-Xaa9-27-Cys3-Xaa1-3-His-Xaa-hydrophobic aa-Cys4-Xaa2-Cys5-hydrophobic aa-Xaa5-47-Cys6-Xaa2-Cys7
Figure 1B Fig. 1B Lovering et al. Gene expression similar in variety of cell lines
Figure 2 DNA binding regulation recombination repair Lovering et al.
RING1 peptide 55 aa synthetic peptide (residues 12-66 in RING1 seq) RING finger metal binding ---> prefers Zinc cobalt cadmium copper
Figure 3A S-C0(II) ___ cobalt ----- zinc Co(II) d-d transitions Fig. 3A Lovering et al.
Figure 4A Zinc dependence binding
RING1 function • No known function (not published until 1993) • Inhibit transactivation of recombination signal binding protein-J (RBP-J) (Hongyan et al.) Ubiquitin-protein ligases
Pfam databasehttp://pfam.sanger.ac.uk/ Database that contains large collection of protein domains and families Represented as sequence alignments and HMMs List of key features about protein New interface that combined other Pfam versions New updates have made it more user-friendly
SMART http://smart.embl-heidelberg.de/ Multiple sequence alignment of members >400 domains in >54,000 different proteins Searches database using HMMs
SMART 2 different modes normal swiss-Prot SP-TrEMBL ensemble genomic proteomes of sequenced genomes