slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Scooter Willis University of Florida Computer and Information Science and Engineering PowerPoint Presentation
Download Presentation
Scooter Willis University of Florida Computer and Information Science and Engineering

Loading in 2 Seconds...

play fullscreen
1 / 25

Scooter Willis University of Florida Computer and Information Science and Engineering - PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on

Predicting co-evolving pairs in Pfam using information theory where entropy is determined by phylogenetic mutation events. Scooter Willis University of Florida Computer and Information Science and Engineering. Homology modeling. Proteins grouped by function will share similar structures

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Scooter Willis University of Florida Computer and Information Science and Engineering


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Predicting co-evolving pairs in Pfam using information theory where entropy is determined by phylogenetic mutation events Scooter Willis University of Florida Computer and Information Science and Engineering

    2. Homology modeling • Proteins grouped by function will share similar structures • Pfam is a large collection of protein sequences grouped by Hidden Markov Models • Pfam 19.0 December 2005 8183 protein families where 2,765 have one or more solved PDB structures

    3. Pfam5000 • “Implications of Structural Genomics Target Selection Strategies:Pfam5000, Whole Genome, and Random Approaches”, John-Marc Chandonia and Steven E. Brenner, PROTEINS: Structure, Function, Bioinformatics (2005) • NIH is supporting structural genomics projects at 9 pilot centers through the Protein Structure Initiative. • Funding is $300 million over the next five years

    4. Co-evolving pairs • Co-evolving pairs is defined as two amino acids > 10 sequence positions apart but within 12 angstroms of each other in 3D space • Apply information theory to protein families to detect co-evolving pairs which provides indicates tertiary placement of secondary structures • Actively research topic with numerous publications in the last 5 years • Accepted that the information value is present but difficult to separate from background noise

    5. Information Theory Approach • The measure of entropy H(x), where x is a discrete random variable and p(x) is the probability function, deals with the randomness or uncertainty there is in a signal and is calculated with the following formula.

    6. Information Theory Approach

    7. H(X) H(Y) H(X|Y) MI(X,Y) H(Y|X) H(X,Y) Mutual Information Venn Diagram

    8. Sampling • The difficulty of applying statistical methods to data sets in genetic sequences is that they tend to not be random samples and the extent of the entire population set is unknown • The bias towards protein sequences that have medical research value and the corresponding phylogenetic influences introduces noise or indistinguishable background signal which decreases the quality of statistical measures • The primary impact is on accurately measuring probability because the sample statistics do not reflect the population’s statistics

    9. Phylogenetic Effect • Reducing the impact of the phylogenetic effect when calculating probability will improve the quality of the information and the ability to detect co-evolving pairs

    10. A A C C A D C C C C C Phylogenetic Tree

    11. XX XX CD CD AE DE CD XX CD CD CE Phylogenetic Tree Mutation Events

    12. Results • Comparison of mutual information calculation using standard probability of sequences (STA) and probability with reduced phylogenetic effect (RPE) • Interested in amino acid pairs that have a mutual information score greater than four standard deviations from the mean (Z>=4) • Maximize the percentage of co-evolving pairs that are less than 12 angstroms apart and greater than 10 sequence positions apart

    13. QGYSLPPEEDLLGLGMTSTSFI.........RGIYLQNAKTLEEYHNTVLRGTF..ATV..KS.KIL.TED.DRI.RK..WAIHKL.MCTFTI...NKEEFFNLFGY....EFDTYFIESR...DRL.IS.METT....GLIH......NSPGS.LKVTPLGELQGYSLPPEEDLLGLGMTSTSFI.........RGIYLQNAKTLEEYHNTVLRGTF..ATV..KS.KIL.TED.DRI.RK..WAIHKL.MCTFTI...NKEEFFNLFGY....EFDTYFIESR...DRL.IS.METT....GLIH......NSPGS.LKVTPLGEL QGYSLPPEEDLLGFGISATSFI.........RGIYLQNVKDLREYSETIQAGKL..ATV..KG.KIL.SQD.DKT.RK..WVIHTL.MCSFSL...SKLEFEQRFHE....RFDRYFADSY...DRL.CG.MESA....GLIR......QDSSS.LQVTPLGEL QGYTTKGGADLVGVGLTSIGEG.........QRHYAQNFKDMSSYEAALDRGVL..PFE..RG.VIL.SDD.DEL.RK..AVIMEL.MANFKL...DIKSIEKEFSI....DFKEYFKEDL...KAL.EE.YKD......FVN......FDENF.IKVNETGVL QGYTTKKFTQTIGIGVTSIGEG.........GDYYTQNYKDLHHYEKALDLGHL..PVE..RG.VAL.SQE.DVL.RK..EVIMQM.MSNLKL...DYSKIEEKFSV....DFKAHFKKEL...EKL.KP.YEEA....GLLS......FNSKG.FEMTKTGGM QGYTTHAGTELFGFGATSISML.........HDAYVQNHKQLKEYYQAVAGDAL..PVS..KG.IKL.TTD.DIL.RR..DVIMCI.MSNFYL...HKQEIEDKYHI....NFDEYFSQEI...AAL.KP.LAAD....GLVS......LSSKH.IQVTEIGRL QGYTTQPESDLLGFGITSISML.........QDVYAQNHKTLKAFYNALDREVM..PIE..KG.FKL.SQD.DLI.RR..TVIKEL.MCQFKL...SAQELESKYNLGFDCDFNDYFAKEL...SAL.DV.LEAD....GLLR......RLGDG.LEVTPRGRI QGYTTLPTADLIGFGLTSISML.........QAAYAQNQKHLATYFSDVAAGHH.GPQE..CG.FNC.TVE.DLL.RR..TIIMEL.MCQFSL...DKGAIARQFNL....DFDAYFASEL...AAL.RE.LAAD....GLLH......LGRDR.LEVTPVGRL QGYTTKKGVELLGFGATSIGML.........YDSYFQNWKTLRDYNKTVDEGKI..PVF..RG.YVL.NED.DFI.RR..EVIMDI.MCNLGV...EFSKIENMFGI....NFREYFAKEL...EEL.KE.MEED....GLIK......VEEDR.IKIMPVGRL QGYSTYADCDLVAIGVSSIGKI.........GSTYSQNERDIDAYYAAIDEGRL..PIM..RG.YQL.NQD.DIL.RR..NIIQDL.MCRFAL...DYRIYESMFGI....PFDRYFKDEL...ADL.EK.LAGL....GLVR......LNSHG.LTVTPKGRF QGYSTHAGYDQVGLGISAIGAI.........AGRYVQNARTLDEYYGALDHGRL..PLA..RG.VAM.SAD.DHL.RR..EIIGAL.MCNGVL...DIPALEARHGI....RFGTAFAPEL...ADL.AA.LGAD....GLVQ......CAPDR.ITVTPLGRL QGYSTHADCDLLAFGMSAISRV.........GDVYAQNEKELDAYYARIDAGEL..PVL..RG.LTL.TPD.DHV.RR..ALIGEL.MCGFEL...DMRLFGTRHGL....NFRQSFASEL...TAL.AP.LEDA....GLVK......VGDER.IVITPQGRL MGYTTHADTDLLGLGVSAISHI.........GATYSQNPRDLPSWEDAVDQGQL..PVW..RG.VAL.SAD.DQL.RA..ELIQQL.MCQGEV...DGALLGQRHGV....DFEQYFAEDL...RSV.QR.LQDD....GLAE......YRHGV.VRASEPGRP MGYTTHADTDLIGLGVSAISHF.........GDSYSQNPRELAAWDAAVDRGAL..PVC..RG.MQL.SAD.DLL.RA..EVIQAL.LCRGRV...DLAAVAQRHQC....DARHCYDDAL...AAL.EL.LAAD....DLVE......VRGLC.VDVTATGWP QGYTTQEECDLLGLGVSAISLL.........GDTYAQNQKELKHYYTAIDNTGI..ALH..KG.FAM.SEE.DCL.RR..DVIKQL.ICNFKL...DYQPIEQQYGI....QFTSHFAEDL...KLL.AP.LSED....GLLE......IGEKA.IQVSAKGRL QGYTTQGDTDLLGMGVSAISMI.........GDCYAQNQKELKQYYQQVDEQGN..ALW..RG.IAL.TRD.DCI.RR..DVIKSL.ICNFRL...DYAPIEKQWDL....HFADYFAEDL...KLL.AP.LAKD....GLVD......VDEKG.IQVTAKGRL QGYTTQGESDLLGLGVSAISML.........GDSYAQNEKDLETYYACVEQRGN..ALW..RG.LTM.TED.DCL.RR..DVIKTL.ICHFQL...SYQPIEQRYGI....RFADYFAEDF...ELL.AP.FEQD....GLVE......RNETG.LRVTPRGRL QGYTTQEECDLLGLGVSSISQI.........GDCYAQNQKDIRPYYEAIDKDGH..ALW..KG.CSL.NRD.DEI.RR..VVIKQL.ICHFDL...DMAKIDEKLGI....KFEEYFAEDL...KLL.QT.FIDD....KLVE......VADRK.ITISPTGRL QGYTTQGECDLVGFGVSAISMI.........GDAYAQNQKELKKYYAQVNDLRH..ALW..KG.VSL.DSD.DLL.RR..EVIKQL.ICNFKL...DKKAIESEFRV....KFDQYFKEDL...QLL.QT.FIND....ELVE......VDDNE.IRVTLRGRL QGYTTHGHCDLIGLGVSAISQI.........GDLYCQNSSDLTAYQNSLASAQL..ATS..RG.LVC.NAD.DRL.RR..AVIQQL.ICNFKL...EFAEIEQQFNI....DFQGYFGALW...PQL.QG.MAED....GLIR......LERER.IEVLPAGRL QGYTTHGHCDLIGLGVSAISQI.........GDLYCQNSSDLNTYQDSLSNAQL..ATQ..RG.LLC.NHD.DRI.RR..AVIQQL.ICHFEL...DFEPIEQAFTL....DFRGYFNDLW...PEL.LT.LQRD....GLIS......LDDKG.IRILPAGRL QGYTTHGHCDLVGLGVSAISQI.........GDLYSQNSSDINDYQTSLDNGQL..AIR..RG.LHC.NSD.DRV.RR..AVIQQL.ICHFEL...AFEDIETEFGI....DFRSYFAELW...PDL.ER.FAAD....GLIR......LDAKG.IDITSSGRL QGYTTHGHCDLIGLGVSAISQV.........GDLYSQNSSDLNDYQRLLDSDQP..ATL..RG.LIC.SED.DRI.RR..AVIQQL.ICHFTL...NFSELEKAFAI....SFRDYFADAW...PQL.LC.MADD....GLIT......LSDSA.IEVRPAGRL QGYTTDNEPVLIGLGASAISTF.........SDAYIQNIADIKNYSRAIEEQGL..ASF..RG.IDI.SQE.DHL.RG..EIISAL.MCHFAV...DLTPYDKSLSL..........EDEK...REL.SH.LEEE....GLIQ......FQQNR.IEMTDAGRP QGYTNDRCGTLIGFGPSSISQF.........PGGYAQNISDVGQYRKRVEAGEL..ATV..RG.YTL.RDT.DRI.RS..AIISAL.MCNFCV...DLNAVAPGMEF..........SDEF...ALL.RP.LVAD....GLVA......VEGRT.IRATENGKS QGYTTDACETLIGFGASAIGRS.........AHGYVQNEVAIGRYAQSVATGQL..ATA..KG.YRL.TAD.DRL.RA..EIIERI.MCDFSV...DLASICQSHGV.....SPDTVVDGN...SQL.QR.LLAD....EIVT......LEDGI.LRLRGEERF QGYTTDACETLIGLGASAIGRT.........NDGYVQNEVPPGLYAQHIASGRL..ATV..KG.YRM.TPE.DRL.RA..GIIERL.MCDFGV...DVPALATAHGF.....DPEMLLRGN...TRL.AM.LESD....GILD......IADGV.IRLREGRRF QGYTTDACKTLIGIGASAIGRF.........GNGYHQNIVPPGLYASCVASGEL..PTA..KI.YEL.TAE.DRV.RA..DVIEQL.MCNFSV...NVAAVCAAHGF.....DPEVLMKQN...DTL.DE.LEKD....GLVQ......REGFM.VRVDGRHRF QGYSADTCKTLIAFGASAIGRV.........GEGYVENAGALEAYSQHIAAGRL..ATS..KG.YRL.IGE.DRV.RG..AIIERL.MCDLEA...DVPAICAAHGF.....DWTHFLDSA...ERL.AM.LADD....GIVD......VENGF.IRVRHGHRI LGYSADTCKTLIGFGASAIGRV.........GEGYVQNEVTRDSYCRHIAAGRL..ATS..KG.YRL.TDE.DRA.RA..AIIERL.MCDLEA...DVPAICAAHGS.....DPIHFLDSA...ERL.AM.LAKD....GIVD......IEKGF.VRVRRQHRF LGYSADTCKTVIGLGPSAIGRL.........REGYVQNESATASYHQHIQAGRP..ATS..KG.YCL.SPE.DRL.RA..AIIERL.MCDLQA...DVPAICAAHGF.....DPIPLLNSA...DRL.GM.LAED....GIVD......IEEGF.IRVKQEHRF LGYSAETCSTVIGLGASAIGRC.........GDGYVQNDLTQSCYNRHIASGRL..AIS..RG.YRL.ATE.DRV.RA..AIIEQL.MCYLEA...DISAICTAQGF.....DQTHLVSSA...KQL.EI.LAED....GIVE......FDNGL.VSVRHERRS QGYTTDQGEVLLGFGASAIGHL.........PQGYVQNEVQIGAYAQSIGASRL..ATA..KG.YGL.TDD.DRL.RA..DIIERI.MCEFSA...DLGDICARHGA.....EPEAMLKSA...SRL.KP.LISD....GVVR......LDGDR.LAVANDSRF QGYTTDDCDSLIGLGASAIGRL.........PAGYMQNHVPLGLYAERIAFGVL..PTA..KG.YLL.SEE.DKL.RA..RVIERL.MCDFEA...DLGQLSSGSGF.....DTGFLVERN...DRL.GE.LMAD....GVVT......ISGER.IVVCEEARF QGYTDDPAPVLVPIGPSSIGQF.........REGFVQNLTPTDAWAARIARDEL..PLG..RA.LAF.SDE.DRL.RA..AVIERL.MCDMTV...DVAAICEAHGF.....STDHLAGSL...ASL.AA.IEVA....GLCV......LDGAV.VTIPEDARR QGYTEDNCETLIGLGPSAISRY.........RQGYAQNIVATGAYEKVVDSGQL..AVA..RG.VEL.SVD.DLA.RG..WIIERL.MCHFAF...SAIELVERFGD.....VGQRLLAMA...SRL.AV.GGGG....LLLR......LDGEN.FVVPKDSRP QGYTEDRCETLIGLGSSSISRF.........RQGYSQNMPSTAEYRRMVEGGHL..ATV..RG.IAF.SED.DRV.RG..WIIERL.MCDFGF...SAADLVERFGE.....AGQKLLFQA...SSI.AI.GDPA....RPLE......LQGDS.YVVSAESRP QGYTTDTADALIGLGASSVGRL.........PQGYVQNMVATREYQRMVGEGGL..AAV..KG.IEL.SQD.DHL.RS..HVIERL.MCDFSI...DLSDMQHRFGK.....VSHSVRDQA...QQF.AA.GDRD....GVVR......LDADV.FAVTEVGKP QGYTDDRAEVLVGLGASSISRF.........PQGYAQNAPATGAHLARIRDGRF..STT..RG.HAF.SAE.DRW.RS..RMIEAL.MCDFEI...RAEEFIRDHGF......DAESLSRI...LTP.VA.AHFG....DMVD......ADASG.LRITPRGRP QGYTDDRAEVLIGLGASAISRF.........PQGFTQNAPSTSDHLRAIRSGRF..STA..RG.HVL.SDE.DRL.RG..RMIEQL.LCEFRI...SRAQILARFAV......APERLETL...FRT.CA.AAFP....GVVE......ITGHG.LEILEEGRP QGYTDDTCPTLLGIGASSISKF.........EQGYLQNTAATAAYIKSIEEGRL..PGY..RG.HRM.TEE.DYL.HG..RAIEMI.MCDFFL...DLPALRARFGE.....PAETMVPRI...AEA.AE.KFTP....FVTV......DADGS.MSIAKEGRA MGYTENTTQMMLALGASSISDT.........WYAFAQNERTDDRYMEEVNKGRF..PIM..RG.HLL.SDE.DLV.LR..RHILNL.MCRQET...SWEDPK.........LYTEELDIAR...YRL.ED.MEND....GIVV......LGEKS.VKVTEIGVP MGYTAKTTDMLLGLGVSAISDS.........WDCFHQNEKIVKKYQKRIYSEGF..ATL..RG.HKL.NEE.DLI.QR..SLILQL.STSGKV...IVPE..............EILREVR...LYL.AS.MEDD....TLVR......WEGNL.LSLTEKGRP ISYTAAPATPMIGLGVGAVGEI.........DGAMFWNDGSQAAWRNALRHLHL..PVS..QA.RPA.TPE.SVQ.RR..AAVERL.LCTLEL...AAAD.............AVGLEDGY...GRL.AA.REAE....GLVR......VLDDR.IVVTEAGRH LGYSDKPTRIVLGVGLGAVSEL.........PNLLSRNHTSLDAWHESLDNKMS..PTC..AG.VIF.TTV.EAK.QR..RLVHRL.SETLRA...PLTEFQG..............AEQQ...GLL.NQ.LQAE....GLVT......AESEW.VQVTDSGRF FGYAETRVSQTLGAGLGAVSEV.........GDIVAQNYIDMDAWHMALDRGHL..ATQ..YI.IDA.TDF.EIT.RR..SVMRRL.MCNTEV...PVSMVAQ..............PEVL...GLL.ES.LENQ....GYTQ......KQGSS.VHLTALGRS QG....ADCL..ALGSGAGGSL.........QGHAYMQHRSLDNYYRLIDSGQK..PLM..MM.TQA.SGE.HPW.RA..KLQSGI..EVGRL...DLSELI...............ADPY...P.L.MP.LISQWYQSNLLK......DNSFC.LRLTDSGRF QG....ADCL..AFGSGAGGSI.........NGYSWMNERNLQTWHESVAAGKK..PLM..MI.MRN.AER.NAQ.WR..HTLQSG.IETACV...PLDE............LTPHAEKLA...PLL.AQ.WHQK....GLSR......DASTC.LRLTNEGRF LFYWRNENYL..GLGVSAGGHI.........GRFRYVNASDLKEYEEKITKGEL..PYE..YV.HEN.TEE.EEA.LE..TVFMGL.RIKEGV...ELNR................VKILL...PLL.EK.LQKKY..PCYLK......VKNGK.IFLSEDGMN .TYWENKKYL..GVGLSAAGYL.........NNVRYKNFFNLKDYYNNLDRNIL..PID..EK.EIL.TEE.DIE.QY..RYLVGF.RLLNKI...IIPS.................EKYL...EKC.MS.LCKE....GYLL......EKENG.YILSHKGLM LVYWNNDEYY..GFGAGAHGYV.........GGVRYMNHGPLPKYLQAMEEGRR..PVF..ES.HHV.SRV.EQM.EE..QMFLGL.RKRSGV...EERVFVERFGV.......SMFSLYE...KQI.AQ.LVAR....CLLE......RTDDR.VRLTDEGLL LTYWNNEEYY..GIGAGAHSYV.........ERVRRVNIGPIKQYIAKVRETGL..PYR..EI.HQV.TWM.EQM.EE..EMFLGL.RKTEGV...SKQCFFEKFGR....DVHDVFGAAI...R...AE.HEK.....GLPE......ETATH.VRLTRRGRL ITYWSNEHYY..GFGAGAHGYV.........GNTRYSNFGPIKKYMEPLQENIL..PTF..QQ.KEL.TLK.EKM.EE..EMFLGL.RKVDGV...DKKHFKQKFGQ....DLDATFANAI...QKT.TA.KGW.......LE......NNEEN.VALTRSGRF ITYWDNEEYY..GIGAGASGYL.........AGIRYKNLGPVHHYLKAAPTEKR....I..NE.EVL.SKK.SQI.EE..EMFLGL.RKKSGV...LVEKFENKFKC........SFEKLY..GEQI.TE.LINQ....KLLY......NDRQR.IHMTDKGFE LMYWDNVEYY..GVGAGASGYL.........DGIRYRNRGPIQHYLKGVSEG.N..ARL..SE.EVL.SKN.EMM.EE..ELFLGL.RKKEGV...SIGKFEQKFGT........SFEKRY...GQIVQE.LQSD....GLLK......ENNGF.IQMTKKGLF LMYWDNVEYF..GCGAGASGYL.........NGIRYQNRVPIQHYLKAVEAG.N..ARL..NE.EVL.RKE.EMM.EE..ELFLGL.RKKTGV...SIQRFQEKFGI........SFEERY...GNIVRE.LQNQ....GLLV......KDDAF.VRMTKKGLF LMYWNNAEYF..GCGAGASGYV.........DGIRYRNRGPIQHYLKAIKEKRQ..ARF..QE.ERL.SQS.EKM.EE..ELFLGL.RKKSGI...SIQRFEDKFGL........PLMEVY..GQAI.DD.LEKD....GLIL......VEKDC.IRMSKKGLF LMYWDNAEYY..GIGAGASGYV.........NGVRYKNHGPIRHYLSAVEEGNA..CIT..ED.H.L.SQK.EQM.EE..EMFLGL.RKKSGV...SMARFEEKFGQ........SFAGLY...GEIVRD.LVQQ....GLMQ......IEGDH.VRMTKRGLF LVYWNNEHYY..GFGAGASSYL.........NQQRYKNFGPIQHYLNLLRNNQL..PII..ET.ENL.SFK.NQI.EE..ELFLGL.RKKEGV...SLHRFKEKFNL........ELTDLY...QEV.LP.ELFD...AQLLT......FKNDH.LKLTRKGLF .VYWFNEEYY..GFGAGASGYV.........DGVRYTNINPVNHYIKAINKESK..AIL..VS.NKP.SLT.ERM.EE..EMFLGL.RLNEGV...SSSRFKKKFDQ....SIESVFGQTI...NNL.KE....K....ELIV......EKNDA.IALTKRGKV .VYWLNEEYY..GFGAGASGYV.........NGVRYTNLNPVNHYIKAINEGKK..PIL..SE.TSP.TYN.ERM.EE..EMFLGL.RMNQGV...SKSRFKKKFNK....LIDEVFVETI...KDL.RC.R.......GLIK......EEGEF.ISLTERGKV LTYWNNDYYY..GFGAGAHGYI.........PGKRTSNSKPLGTYMRAAKEEGS..AID..EI.EEI.TKK.DQI.EE..ELFLQL.RKTSGI...DKKMFEQKYGV........SLEQLY..EKEL.QD.LLEQ....GLLR......LIDGN.YRLTDRGML LIYWELDNYI..GCGASAHSYF.........NGVRYRNINNVKKYIEQISKGNS..VVE..EN.HRN.LLK.EDM.EE..FMFLGL.RKTRGV...SIEEFKLKFNK....DIQEVYGDVI...K....K.YETI....GMII......LNEHR.VFLTERGMQ LAYWNMDNWI..GVGSAAASYI.........NGKRIKNISSVEKYINSINEKRE..AVE..EI.INN.SKN.DNM.EE..FMFMGL.RKINGI...DENEFKNRFSM........NINDVY..GEIL.NK.YIDE....GLLI......RESGR.IFLSEKGIE LIYWDLEEYI..GCGLAAHSFL.........KGYRYSNVHNIEDYIKLINENKN..IKI..NT.YKN.LTK.DTM.EE..FMFMGL.RKIKGI...NTEEFYKRFHK.......NIYEVYG...DII.KK.YINE....GLII......EKHGN.IFLSSIGIE ILYWECREYL..GFGAGAHSYF.........EGTRWNNVERIEKYIEAILKRKD..ARE..EI.INL.SFE.DKM.SE..FMFLGL.RMRKGV...CEEEFRKRFGI.......SMFERYE...EIF.IK.YEKM....GLIE......KDKDC.VRLTEKGID LAYWGAKDYL..GCGAGAVGCV.........ANERFFAKKLIENYIKDPLQRQV..........ETL.NKQ.DKR.LE..KLFLGL.RCVLGV...ELSF.................LDEN...K........VK....FLIE......ENKAF.I...KNNRL ..YWTDKPFL..GLGVSASQYL.........NGIRSKNFSRISHYLRAAHHHQP..TAE..SM.EEL.PPC.ERI.KE..ALALRL.RLCDPI...PFCM............FPEELVNEILMNPSI.RP.LFA...............INAQT.FSLNKQGRL LYYWTDRPFL..GLGVSASQYL.........HGERSKNYSHISHYLRAVRKN.L..PTQ..ETSEIL.PKK.ERI.KE..ALALRL.RLLEGA...DLAE............FPSTLISML...TQD.VK.LQN......LFS......VHGQC.LALNRQGRL LVYWKMEEFL..GVGVSAWGFY.........ENVRYGNTKNISKYVKFLKEDKK..PVE..FR.VQL.DET.ELE.KE..RIMLGL.RTTEGI...EEKYLK..............FVPEY...L...RD.F.........FE......VKGGR.LRIKEEHLL .VYWENRPYY..GFGMGAASYV.........EGKRFTRPRKTKEYYQWVQELIANHGVI..DW.EIT.PKA.DVL.LE..TLMLGL.RLADGV...SLAALTEEFGK.......EKIQELH...QCL.QP.YFTQ....GWVQ......VVGDR.LRLSDPDGF QVYWRNQSYY..GFGMGATSYL.........QHRRLSRPRTRREYYQWLQALPE..SLH..QG.SPD.SLW.DRW.LE..TLMLGL.RLRDGL...SLPALAD........EFPASWVEAL...QAA.AA.KISP....ALLS......LAGDR.LHLTQPEGF TSYWRGIPYL..GCGPSAHSFN.........GTTREWNVSSIDLYIKGIEGNQR...DF..ET.ENL.DQT.TRY.NE..FIITTI.RTVWGT...PIEKLKQEFGN.......ELWEYCR...KMS.AP.YLEN....GKLE......IHEGA.LRLTREGIF LNYWRFGDYL..GIGCGAHGKLSF....ADGRIVRTTKTKHPRGYLAALNNLAK..AYL..DS.EQL.VADQDKP.FE..FFMNRF.RLIEPC...PKADFTATTGL........TIDVIR...PTL.DW.ALSE....GYLS......EDDQH.WQITEKGKL LNYWRFGDYL..GIGCGSHGKLSF....ADGRIIRTTKIKHPKGYLAAHQNMVK..PYL..DS.EQL.VEEIDRP.FE..FFMNRF.RLMEAC...PKQDFI..........DTTGLPLSFI.ETTI.QW.AVEM....GYLN......DNETS.WQITEKGKL LNYWRFGDYL..GIGCGSHGKLSF....ADGRIIRTTKVKHPRGYLAAYQNMVK..PYL..HT.EQLVADE.DRP.FE..FFMNRF.RLMEAC...PKQDYV.........DTTGLPLSTI...QDT.IDWALEM....GYLS......ETETH.WQITEKGKL LNYWQFGDYL..GIGCGAHGKVTL...PEENRIIRTVKIKHPKGYLTA.DNY.....TF..EQ.TEV.AQE.DRA.LE..YLMNRL.RLMTPI...PKQEFEDRTGL.....PRDVLKDGM...EKA.KQ.R.......GLLT......ESAEH.WQLTNKGHM LNYWQFGDYL..GIGAGAHGKIS.....YPDRIERTVRRRHPNDYLALMQNRPS.EAVE..R..KTV.AAE.DLP.FE..FMMNAL.RLTDGV...PTAMLQERTGV........PSAKIM...AQI.ET.ARQK....GLLE......TDPAV.FRPTEKGRL LNYWQFGDYL..GIGPGAHGKLS.....FPHRVIRDMRHKHPETYLRQAETAGG..ATVVQEQ.REV.DAA.DLP.FE..FMLNAL.RLTDGF...PVTLFQERTGL........PLRGIE...REL.DA.AERR....GLLV......HDHAT.IRPTELGQR LNYWRFGDFI..GIGAGAHGKLTF....ADGRILRTWKTRLPKDYLN....LAK..PFR..AG.EKL.LPV.DELPFE..FLMNAL.RLTDGV...EAELFTQRTGL........PLAQLQ...EAR.RA.AEQK....GLLQ......VEPDR.LVATPRGQL LNYWAFGDFI..GIGAGAHGKLSH....PDGRIIRTWKTRLPKDYLNP..DKPF...QA..GS.KLL.PLD.ELP.FE..FLMNAL.RLTNGV...DAALFRERTGL........SLDSLA...EAR.RQ.AEQK....GLLH......EDPAR.LIATPQGQL LNYWRFGDYL..GIGAGAHGKISS...GAEAHVLRRWKHKHPQSYLAS..AGTA..ASI..GG.DEI.VPG.ERL.PF..EYMLNLLRLHEGF...RLSDFEASTGL.......AACAIEA...P.L.AR.AVAK....GWMR......QQDGR.VVPTELGRR LNYWRFGDYL..GIGAGAHGKISS...GAEQQVLRRWKHKHPQSYLASAGSA.A..AIG..GD.EHV.PAA.RLP.FE..YMLNLL.RLHEGF...RLSDFEACTGL..........PAQVL.QAPL.AR.AMAQ....GWLV......EQHGR.IVPTELGRR MGYWVDGDWW..GAGPGAHSHI.........GDRRFYNIKHPARYSAQIAAGEL..PIK..ET.EML.TAE.DHH.TE..RVMLGL.RLKQGV...PLNLFT...............PAAR...PVI.DR.HIAG....GLLH......VNALGNLAVTDAGRL MGYWVDGDWW..GAGPGAHSHI.........GDHRFYNVKHPARYSAQIAGGEL..PIM..DT.ELL.TAD.DHH.TE..RVMLGL.RLKQGL...PAGIFS...............PSAH...RVI.DR.HIDR....GLLH......RVGGN.IAVTDAGRL LGYWDGGQWW..GAGPGAHGYI.........GVTRWWNVKHPNTYAEILAGATL..PVA..GF.EQL.GAD.ALH.TE..DVLLKV.RLRQGL...PLARLG...............AAER...ERA.EA.VLAD....GLLD......YHGDR.LVLTGRGRL LVYWRGVDYV..GVGPGAHGRLA.....LPEGRAATTAHRAIKDYIAAVGDHGV...GF..QS.EIL.TPE.DAA.LE..RLVLGM.RIDAGV...GFDE.VAVLGL..........DPDV...AKV.RD.LVET....GLLV......EDRAR.LRATRAGRL LTYWRYGEYV..GVGPGAHGRFV.....EHGRRTVTIAERMPETWANLVEAKGH..GVT..GG.EIL.TRS.EEA.DE..FLLMGL.RLAEGI...DLARYEAFSGR..........GLSS...ARL.SV.LQGE....GLVA.....PIGNAR.LRATPAGMI LVYWRYGQYA..GIGPGAHGRFV.....ENDVRTVTMTEKHPETWLDHVERRGH..GII..EE.EYL.DGG.QEG.DE..FLMMGL.RLREGI...DLARYARLSGH..........AIDD...KRL.AK.LIAE....GMIE.....PMGGSL.IRATPDGAL LTYWRYGDYA..GIGPGAHGRLA.....IGSGKIATATERNPEAWLQRVEECGE..GLV..ER.ELL.DFE.AQA.DE..LLLMGL.RLREGV...DLAR.............WQTLSGRD...PDP.AR.EEF......LIEHGFIERIGNSR.LRCTPAGML LTYWRYGDYA..GIGPGAHGRLT.....RGASKLATATERHPETWLETVEREGH..GMV..DQ.ELL.GVD.EQA.DE..LLLMGL.RLREGI...DLAR................WSDLS..GRDL.DP.EKEE....FLLQHGFVERLGNSR.LRCTPSGML LVYWRGDEYA..GIGPGAHGRLD.....IDGIRHATATEKRPEAWLLRVETNGH..GVV..TD.DLL.NSE.ERA.DE..FLLMGL.RLAEGI...DPERYTALSGR..........ALDP...KRI.AL.LREE....GAIT......VDATGRLRVTSSGFP WTYWQCGQYL..GVGPGAHGRFMPQGAGGHTREARIQTL.EPDNWMKEVMLFGH..GTR..KR.VPL.GRL.ELL.EE..VLALGL.RTDVGITHQHWQQFEPQLTL......WDVFGANK...E.V.QE.LLER....GLLQ......LDHRG.LRCSWEGLA LAYWDLEDWK..AIGIGAYGFE.........KNVYYQNYGSYLNYYK...KNQN............W.NQK.DIY.LY..ILMMGL.RKIDGI...DLNR..............EINKKAY...EYF.KN.KINY....PLVT......IKDNK.LKANNVHIL LKYWTMEYYL..GIGPGAHGFL.........PSGRYSNPRNVDTY....KRKNF..SKE..YT.KPN.FYE.ELI.LSLFRLFQPI.LMESFY...ELIP.............DQSQTLDL...Q.L.KK.FQES....GLCE......FSNGI.FQWKPEAVL PF04055.9 Sample Data 180-540 QGYSLPPEEDLLGLGMTSTSFI.........RGIYLQNAKTLEEYHNTVLRGTF..ATV..KS.KIL.TED.DRI.RK..WAIHKL.MCTFTI QGYSLPPEEDLLGFGISATSFI.........RGIYLQNVKDLREYSETIQAGKL..ATV..KG.KIL.SQD.DKT.RK..WVIHTL.MCSFSL QGYTTKGGADLVGVGLTSIGEG.........QRHYAQNFKDMSSYEAALDRGVL..PFE..RG.VIL.SDD.DEL.RK..AVIMEL.MANFKL QGYTTKKFTQTIGIGVTSIGEG.........GDYYTQNYKDLHHYEKALDLGHL..PVE..RG.VAL.SQE.DVL.RK..EVIMQM.MSNLKL QGYTTHAGTELFGFGATSISML.........HDAYVQNHKQLKEYYQAVAGDAL..PVS..KG.IKL.TTD.DIL.RR..DVIMCI.MSNFYL QGYTTQPESDLLGFGITSISML.........QDVYAQNHKTLKAFYNALDREVM..PIE..KG.FKL.SQD.DLI.RR..TVIKEL.MCQFKL QGYTTLPTADLIGFGLTSISML.........QAAYAQNQKHLATYFSDVAAGHH.GPQE..CG.FNC.TVE.DLL.RR..TIIMEL.MCQFSL QGYTTKKGVELLGFGATSIGML.........YDSYFQNWKTLRDYNKTVDEGKI..PVF..RG.YVL.NED.DFI.RR..EVIMDI.MCNLGV QGYSTYADCDLVAIGVSSIGKI.........GSTYSQNERDIDAYYAAIDEGRL..PIM..RG.YQL.NQD.DIL.RR..NIIQDL.MCRFAL QGYSTHAGYDQVGLGISAIGAI.........AGRYVQNARTLDEYYGALDHGRL..PLA..RG.VAM.SAD.DHL.RR..EIIGAL.MCNGVL QGYSTHADCDLLAFGMSAISRV.........GDVYAQNEKELDAYYARIDAGEL..PVL..RG.LTL.TPD.DHV.RR..ALIGEL.MCGFEL MGYTTHADTDLLGLGVSAISHI.........GATYSQNPRDLPSWEDAVDQGQL..PVW..RG.VAL.SAD.DQL.RA..ELIQQL.MCQGEV MGYTTHADTDLIGLGVSAISHF.........GDSYSQNPRELAAWDAAVDRGAL..PVC..RG.MQL.SAD.DLL.RA..EVIQAL.LCRGRV QGYTTQEECDLLGLGVSAISLL.........GDTYAQNQKELKHYYTAIDNTGI..ALH..KG.FAM.SEE.DCL.RR..DVIKQL.ICNFKL QGYTTQGDTDLLGMGVSAISMI.........GDCYAQNQKELKQYYQQVDEQGN..ALW..RG.IAL.TRD.DCI.RR..DVIKSL.ICNFRL QGYTTQGESDLLGLGVSAISML.........GDSYAQNEKDLETYYACVEQRGN..ALW..RG.LTM.TED.DCL.RR..DVIKTL.ICHFQL QGYTTQEECDLLGLGVSSISQI.........GDCYAQNQKDIRPYYEAIDKDGH..ALW..KG.CSL.NRD.DEI.RR..VVIKQL.ICHFDL QGYTTQGECDLVGFGVSAISMI.........GDAYAQNQKELKKYYAQVNDLRH..ALW..KG.VSL.DSD.DLL.RR..EVIKQL.ICNFKL QGYTTHGHCDLIGLGVSAISQI.........GDLYCQNSSDLTAYQNSLASAQL..ATS..RG.LVC.NAD.DRL.RR..AVIQQL.ICNFKL QGYTTHGHCDLIGLGVSAISQI.........GDLYCQNSSDLNTYQDSLSNAQL..ATQ..RG.LLC.NHD.DRI.RR..AVIQQL.ICHFEL QGYTTHGHCDLVGLGVSAISQI.........GDLYSQNSSDINDYQTSLDNGQL..AIR..RG.LHC.NSD.DRV.RR..AVIQQL.ICHFEL QGYTTHGHCDLIGLGVSAISQV.........GDLYSQNSSDLNDYQRLLDSDQP..ATL..RG.LIC.SED.DRI.RR..AVIQQL.ICHFTL QGYTTDNEPVLIGLGASAISTF.........SDAYIQNIADIKNYSRAIEEQGL..ASF..RG.IDI.SQE.DHL.RG..EIISAL.MCHFAV QGYTNDRCGTLIGFGPSSISQF.........PGGYAQNISDVGQYRKRVEAGEL..ATV..RG.YTL.RDT.DRI.RS..AIISAL.MCNFCV QGYTTDACETLIGFGASAIGRS.........AHGYVQNEVAIGRYAQSVATGQL..ATA..KG.YRL.TAD.DRL.RA..EIIERI.MCDFSV QGYTTDACETLIGLGASAIGRT.........NDGYVQNEVPPGLYAQHIASGRL..ATV..KG.YRM.TPE.DRL.RA..GIIERL.MCDFGV QGYTTDACKTLIGIGASAIGRF.........GNGYHQNIVPPGLYASCVASGEL..PTA..KI.YEL.TAE.DRV.RA..DVIEQL.MCNFSV QGYSADTCKTLIAFGASAIGRV.........GEGYVENAGALEAYSQHIAAGRL..ATS..KG.YRL.IGE.DRV.RG..AIIERL.MCDLEA LGYSADTCKTLIGFGASAIGRV.........GEGYVQNEVTRDSYCRHIAAGRL..ATS..KG.YRL.TDE.DRA.RA..AIIERL.MCDLEA LGYSADTCKTVIGLGPSAIGRL.........REGYVQNESATASYHQHIQAGRP..ATS..KG.YCL.SPE.DRL.RA..AIIERL.MCDLQA LGYSAETCSTVIGLGASAIGRC.........GDGYVQNDLTQSCYNRHIASGRL..AIS..RG.YRL.ATE.DRV.RA..AIIEQL.MCYLEA QGYTTDQGEVLLGFGASAIGHL.........PQGYVQNEVQIGAYAQSIGASRL..ATA..KG.YGL.TDD.DRL.RA..DIIERI.MCEFSA QGYTTDDCDSLIGLGASAIGRL.........PAGYMQNHVPLGLYAERIAFGVL..PTA..KG.YLL.SEE.DKL.RA..RVIERL.MCDFEA

    14. PF04055.9 STA method

    15. PF04055.9 RPE method

    16. Sequence Positions of Interest

    17. 8 410 555 188 26 8 198 412 330 STA RPE Clusters of Interest

    18. Mutual Information Analysis in Pfam • 2,765 families have one or more PDB structures • Filter on families with > 100 sequences and < 5000 sequences • At least one PDB structure must have 90% agreement with an associated Pfam sequence in the family • 783 families were used to test the predictive quality of the STA and RPE methods

    19. Initial Results

    20. Additional Filtering Percentage < 12 Angstroms per Pfam family Number of MI pairs where Z>=2

    21. #MI scores < 500 and MC > 40

    22. PF00014.13 PDB 1KUN

    23. Structure Prediction CASP7 • Mutual Information prediction of co-evolving pairs is used to build a model to score a predicted tertiary structure • When a PDB exists for a particular Pfam family then we have accurate data to score the predicted structure • When no PDB exists then a predicted tertiary structure will score better when the sum of the distances between co-evolving pairs is minimum as compared to other predicted structures