1 / 62

Motifs

Motifs. Benny Shomer December 2004. Human Insulin. Why Patterns???. C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-C. Why Patterns???. PKC ATP binding site. Challenges …. Challenges …. Challenges …. Database Bias ….

lois
Download Presentation

Motifs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Motifs Benny Shomer December 2004

  2. Human Insulin Why Patterns??? C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-C

  3. WhyPatterns??? PKC ATP binding site

  4. Challenges…

  5. Challenges…

  6. Challenges… Database Bias… Out of 23 Transferrin Receptor protein entries in UniProt, there are 21 entries of TFR1, and only 2 entries of TFR2.

  7. Challenges… Mathematical Significance  Biological Significance

  8. Concensus >uniprot|Q8WQG9|JNK1_CAEEL Stress-activated protein kinase jnk-1 (EC 2.7.1.37). MEERLSTTSSYPSHPGRSVEEDHNTLLASSSISSIIRGTRGHLNNFIESVGNWLVPSSSG RDDDAVSLDSCQSVYSPVRHHINSGTGGGILMEPSSIHVPENYYSVTIGEAQMVVLKRYQ NLRLIGSGAQGIVCSAFDTVRNEQVAIKKLSRPFQNVTHAKRAYRELKLMSLVNHKNIIG ILNCFTPQKKLDEFNDLYIVMELMDANLCQVIQMDLDHERLSYLLYQMLCGIRHLHSAGI IHRDLKPSNIVVRSDCTLKILDFGLARTAIEAFMMTPYVVTRYYRAPEVILGMGYKENVD VWSIGCIFGELIRGRVLFPGGDHIDQWTRIIEQLGTPDRSFLERLQPTVRNYVENRPRYQ ATPFEVLFSDNMFPMTADSSRLTGAQARDLLSRMLVIDPERRISVDDALRHPYVNVWFDE IEVYAPPPLPYDHNMDVEQNVDSWREHIFRELTDYARTHDIYS >uniprot|P92208|JNK_DROME Stress-activated protein kinase JNK (EC 2.7.1.37) (dJNK) (Basket protein). MTTAQHQHYTVEVGDTNFTIHSRYINLRPIGSGAQGIVCAAYDTITQQNVAIKKLSRPFQ NVTHAKRAYREFKLMKLVNHKNIIGLLNAFTPQRNLEEFQDVYLVMELMDANLCQVIQMD LDHDRMSYLLYQMLCGIKHLHSAGIIHRDLKPSNIVVKADCTLKILDFGLARTAGTTFMM TPYVVTRYYRAPEVILGMGYTENVDIWSVGCIMGEMIRGGVLFPGTDHIDQWNKIIEQLG TPSPSFMQRLQPTVRNYVENRPRYTGYSFDRLFPDGLFPNDNNQNSRRKASDARNLLSKM LVIDPEQRISVDEALKHEYINVWYDAEEVDAPAPEPYDHSVDEREHTVEQWKELIYEEVM DYEAHNTNNRTR >uniprot|Q966Y3|JNK_SUBDO Stress-activated protein kinase JNK (EC 2.7.1.37). MSSSDYYSQRVGDTVFTVQKRYTNLTNIGSGAQGVVCSAFDTVTQEKIAIKKLVKPFQNE TYAKRAFRELRLMKMVDHKNIIGLKNLFTPAKSLDDFQDVYIVMELMDANLCRVIGIELD HDRMSYLLYQLLCGIKHLHSAGIIHRDLKPSNIVVKEDCSLKILDFGLARTADQTFNMTP YVVTRYYRAPEVIVGMKYKENVDIWSVGCIFAEMIRGDILLPGKDYIDQWNKVTQVLGTP PSVFFKQLSSSVRLYCESQPRYAGKSWKDLFPDDVFPNDTPEDKAKTRHGRDLLSKMLQI DPQNRITVEQALAHPYVSIWYDPAEVHAPPPKRYDHALDEQSIPLDQWKTRIYEEVKTYN S >uniprot|Q9DGD9|MK08_BRARE Mitogen-activated protein kinase 8 (EC 2.7.1.37) (Stress-activated protein kinase JNK1) MNRNKREKEYYSIDVGDSTFTVLKRYQNLRPIGSGAQGIVCSAYDHVLDRNVAIKKLSRP FQNQTHAKRAYRELVLMKCVNHKNIIGLLNVFTPQKTLEEFQDVYLVMELMDANLCQVIQ MELDHERLSYLLYQMLCGIKHLHAAGIIHRDLKPSNIVVKSDCTLKILDFGLARTAATGL LMTPYVVTRYYRAPEVILGMGYQANVDVWSIGCIMAEMVRGSVLFPGTDHIDQWNKVIEQ LGTPSQEFMMKLNQSVRTYVENRPRYAGYSFEKLFPDVLFPADSDHNKLKASQARDLLSK MLVIDASKRISVDEALQHPYINVWYDPSEVEAPPPAITDKQLDEREHSVEEWKELIYKEV LEWEERTKNGVIRGQPASLAQVQQ CLUSTAL W (1.83) multiple sequence alignment uniprot|P45983|MK08_HUMAN NNFYSVEIGDSTFTVLKRYQNLKPIGSGAQGIVCAAYDAILERNVAIKKL uniprot|P49185|MK08_RAT NNFYSVEIADSTFTVLKRYQNLKPIGSGAQGIVCAAYDAILERNVAIKKL uniprot|Q91Y86|MK08_MOUSE NNFYSVEIGDSTFTVLKRYQNLKPIGSGAQGIVCAAYDAILERNVAIKKL uniprot|Q61831|MK10_MOUSE NQFYSVEVGDSTFTVLKRYQNLKPIGSGAQGIVCAAYDAVLDRNVAIKKL uniprot|P49187|MK10_RAT NQFYSVEVGDSTFTVLKRYQNLKPIGSGAQGIVCAAYDAVLDRNVAIKKL uniprot|P53779|MK10_HUMAN NQFYSVEVGDSTFTVLKRYQNLKPIGSGAQGIVCAAYDAVLDRNVAIKKL uniprot|Q90327|MK8A_CYPCA KEFYSVDVGDSTFTVLKRYQNLRPIGSGAQGIVCSAYDHNLERNVAIKKL uniprot|O42099|MK8B_CYPCA KEFYSVDVGDSTFTVLKRYQNLRPIGSGAQGIVCSAYDHNLERNVAIKKL uniprot|Q9DGD9|MK08_BRARE KEYYSIDVGDSTFTVLKRYQNLRPIGSGAQGIVCSAYDHVLDRNVAIKKL uniprot|P45984|MK09_HUMAN SQFYSVQVADSTFTVLKRYQQLKPIGSGAQGIVCAAFDTVLGISVAVKKL uniprot|P49186|MK09_RAT GQFYSVQVADSTFTVLKRYQQLKPIGSGAQGIVCAAFDTVLGINVAVKKL uniprot|Q9WTU6|MK09_MOUSE GQFYSVQVADSTFTVLKRYQQLKPIGSGAQGIVCAAFDTVLGINVAVKKL uniprot|P79996|MK09_CHICK SQFYSVQVADSTFTVLKRYQQLKPIGSGAQGIVCAAFDTVLGINVAVKKL uniprot|P92208|JNK_DROME HQHYTVEVGDTNFTIHSRYINLRPIGSGAQGIVCAAYDTITQQNVAIKKL uniprot|Q966Y3|JNK_SUBDO -DYYSQRVGDTVFTVQKRYTNLTNIGSGAQGVVCSAFDTVTQEKIAIKKL uniprot|Q8WQG9|JNK1_CAEEL ENYYSVTIGEAQMVVLKRYQNLRLIGSGAQGIVCSAFDTVRNEQVAIKKL :.*: :.:: :.: .** :* *******:**:*:* .:*:***

  9. http://blocks.fhcrc.org/ BLOCKS Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. Blocks are made automatically by looking for the most highly conserved regions in groups of proteins documented in InterPro.

  10. BLOCKS

  11. BLOCKS

  12. BLOCKS

  13. BLOCKS

  14. PSSM PSSM (pronounced "possum" ) is a Position Specific Scoring Matrix. A profile is one type of PSSM. PSSMs enable the scoring of multiple alignments with sequences, structures etc. The main advantage of PSSMs is the relative weighting of amino acids in each position which enables an efficient scoring of matches to the profile.

  15. EF10_XENLA IGYNPDTVAFVPISGWNGDNMLEPSPNMPWFKGWKITRKEGSGSGTTLLEALDCILPPSR EF11_CRIGR IGYNPDTVAFVPISGWNGDNMLEPSANMPWFKGWKVTRKDGSASGTTLLEALDCILPPTR EF11_HUMAN IGYNPDTVAFVPISGWNGDNMLEPSANMPWFKGWKVTRKDGNASGTTLLEALDCILPPTR EF11_MOUSE IGYNPDTVAFVPISGWNGDNMLEPSANMPWFKGWKVTRKDGHASGTTLLEALDCILPPTR EF1A_BRARE IGYNPASVAFVPISGWHGDNMLEASSNMGWFKGWKIERKEGNASGTTLLDALDAILPPSR EF1A_CHICK IGYNPDTVAFVPISGWNGDNMLEPSSNMPWFKGWKVTRKDGNASGTTLLEALDCILPPTR ***** :*********:******.*.** ******: **:* .******:***.****:* PSSM I G Y N P D T V A F V P I S G W N G D N A 0 0 0 0 0 1.0 0 0 6.0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 0 0 0 5.0 0 0 0 0 0 0 0 0 0 0 0 0 6.0 0 E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F 0 0 0 0 0 0 0 0 0 6.0 0 0 0 0 0 0 0 0 0 0 G 0 6.0 0 0 0 0 0 0 0 0 0 0 0 0 6.0 0 0 6.0 0 0 H 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0 0 0 0 I 6.0 0 0 0 0 0 0 0 0 0 0 0 6.0 0 0 0 0 0 0 0 K 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 L 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 M 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N 0 0 0 6.0 0 0 0 0 0 0 0 0 0 0 0 0 5.0 0 0 6.0 P 0 0 0 0 6.0 0 0 0 0 0 0 6.0 0 0 0 0 0 0 0 0 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 S 0 0 0 0 0 0 1.0 0 0 0 0 0 0 6.0 0 0 0 0 0 0 T 0 0 0 0 0 0 5.0 0 0 0 0 0 0 0 0 0 0 0 0 0 V 0 0 0 0 0 0 0 6.0 0 0 6.0 0 0 0 0 0 0 0 0 0 W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6.0 0 0 0 0 Y 0 0 6.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

  16. http://www.expasy.org/prosite/ • Consensus based fingerprinting is limited in its sensitivity. • It is unable to handle gapped fingerprints that span across structurally related domains. • Regular expression based fingerprinting is more suitable for such tasks and is more sensitive.

  17. Add new sequences ? Read Reviewson protein families Build Multiple Alignment Study Key Regions: - Enzyme catalytic sites. - Prostethic group attachment sites (heme, pyridoxal-phosphate, biotin, etc). - Amino acids involved in binding a metal ion. - Cysteines involved in disulfide bonds. - Regions involved in binding a molecule (ADP/ATP, GDP/GTP, calcium, DNA, etc.) or another protein. Find “core patterns”

  18. Use pattern Find “core patterns” Scan UniProt with core pattern Increase size of pattern Yes No All family members detectedand few false positives?

  19. F K L L S H C L L V F K A F G Q T M F Q Y P I V G Q E L L G F P V V K E A I L K F K V L A A V I A D L E F I S E C I I Q F K L L G N V L V C A -18 -10 -1 -8 8 -3 3 -10 -2 -8 C -22 -33 -18 -18 -22 -26 22 -24 -19 -7 D -35 0 -32 -33 -7 6 -17 -34 -31 0 E -27 15 -25 -26 -9 23 -9 -24 -23 -1 F 60 -30 12 14 -26 -29 -15 4 12 -29 G -30 -20 -28 -32 28 -14 -23 -33 -27 -5 H -13 -12 -25 -25 -16 14 -22 -22 -23 -10 I 3 -27 21 25 -29 -23 -8 33 19 -23 K -26 25 -25 -27 -6 4 -15 -27 -26 0 L 14 -28 19 27 -27 -20 -9 33 26 -21 M 3 -15 1014 -17 -10 -9 25 12 -11 N -22 -6 -24 -27 1 8 -15 -24 -24 -4 P -30 24 -26 -28 -14 -10 -22 -24 -26 -18 Q -32 5 -25 -26 -9 24 -16 -17 -23 7 R -18 9 -22 -22 -10 0 -18 -23 -22 -4 S -22 -8 -16 -21 11 2 -1 -24 -19 -4 T -10 -10 -6 -7 -5 -8 2 -10 -7 -11 V 0 -25 22 25 -19 -26 6 19 16 -16 W 9 -25 -18 -19 -25 -27 -34 -20 -17 -28 Y 34 -18 -1 1 -23 -12 -19 0 0 -18 Profiles in PROSITE

  20. Syntax and application AC PS01204; DE NF-kappa-B/Rel/dorsal domain signature. PA F-R-Y-x-C-E-G.

  21. 1A3Q TitleHuman Nf- -B P52 Bound To DNA Classification Complex (Transcription Factor/DNA)

  22. AC PS00032; DE 'Homeobox' antennapedia-type protein signature. PA [LIVMFE]-[FY]-P-W-M-[KRQTA].

  23. 1B8I TitleStructure Of The Homeotic Ubx/Exd/DNA Ternary Complex

  24. AC PS01253; DE Type I fibronectin domain. PA C-x(6,8)-[LFY]-x(5)-[FYW]-x-[RK]-x(8,10)-C-x-C-x(6,9)-C. CFEPQLLRFFHKNEIWYRTEQAAVARCQCKGPDAHC

  25. CFEPQLLRFFHKNEIWYRTEQAAVARCQCKGPDAHC CFEPQLLRFFHKNEIWYRTEQAAVARCQCKGPDAHC 1E88 Title  Gelatin-Binding Domain Of Human Fibronectin

  26. {C} AC PS00317; DE WAP-type 'four-disulfide core' domain signature. PA C-x-{C}-[DN]-x(2)-C-x(5)-C-C. CLKDTDCPGIKKCC

  27. 2REL Title   Solution Structure Of R-Elafin, A Specific Inhibitor Of Elastase. CLKDTDCPGIKKCC

  28. PA R-x-[DMV]-R-L-[D>]. PA <C-G-[ILVM]-x(2)-D. PA R-x-[DMV]-R-L-D>. N C

  29. AC PS00695; DE Enterobacterial virulence outer membrane protein DE signature 2. PA [FYW]-x(2)-G-x-G-Y-[KR]-F>. WIAGVGYRF

  30. 1ORM Title   NMR Fold Of The Outer Membrane Protein Ompx In Dhpc Micelles WIAGVGYRF

  31. http://us.expasy.org/tools/scanprosite/ AF-10 Protein (Causing Lymphoma)

  32. http://us.expasy.org/tools/scanprosite/

  33. http://us.expasy.org/tools/scanprosite/

  34. http://us.expasy.org/tools/scanprosite/

  35. http://us.expasy.org/tools/pratt/ PRATT

  36. PRATT I submitted 62 sequences which contain the type II fibronectin collagen-binding domain. These sequences all contain a PROSITE pattern: C-x(2)-P-F-x-[FYWI]-x(7)-C-x(8,10)-W-C-x(4)-[DNSR]-[FYW]-x(3,5)-[FYW]-x-[FYWI]-C.

  37. PRATT PRATT selected as the best pattern: W-C-[AGS]-x-T-x(2)-[FY]-x(2)-[DR]-x(2)-[FWY]-[ACGS]-x-[CS] The UniProt entry FA12_HUMAN (Coagulation factor XII precursor) Was selected as a test case. The PROSITE pattern FIBRONECTIN_2 matches the sequence at position 46 along 42 a.a. ‘CHFPFQYHRQLYHKCTHKGRPGPQPWCATTPNFDQDQRWGYC’ The PRATT pattern, matches the sequence at position 71 along 17 a.a., in parallel. ‘ CATTPNFDQDQRWGYC’

  38. -BLAST Run BLAST search with query protein Harvest hits with score above threshold (<0.005) Construct a PSSM Perform Multiple Alignment (automatically)

  39. -BLAST

  40. -BLAST

  41. -BLAST

  42. -BLAST

  43. -BLAST

  44. -BLAST

  45. -BLAST Run BLAST search with query protein and a Pattern Harvest hits with score above threshold (<0.005) that match the pattern Construct a PSSM Perform Multiple Alignment (automatically)

  46. -BLAST ID DISINTEGRIN_1; PATTERN. AC PS00427; DE Disintegrins signature. PA C-x(2)-G-x-C-C-x-[NQRS]-C-x-[FM]-x(6)-C-[RK].

  47. -BLAST First Iteration displays the pattern hit on the Subject/Query alignment.

  48. -BLAST

More Related