1 / 22

Introduction to Bioinformatics - Tutorial no. 5

Introduction to Bioinformatics - Tutorial no. 5. MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription Factor DB. WebLogo - Input. Aligned Sequences (e.g. output of ClulatlW). http://weblogo.berkeley.edu. RUN !. Genes:. Proteins:.

Download Presentation

Introduction to Bioinformatics - Tutorial no. 5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription Factor DB

  2. WebLogo - Input Aligned Sequences (e.g. output of ClulatlW) • http://weblogo.berkeley.edu RUN !

  3. Genes: Proteins: WebLogo - Output

  4. MEME • http://meme.sdsc.edu/ • Motif discovery from unaligned sequences • Genomic or protein sequences • Identifies profile motifs • Multiple motifs for any input • Flexible model of motif presence • Motif can be absent in some sequences • Can appear several times in one sequence

  5. MEME Input Email address Multiple input sequences Range of motif lengths How many motifs? How many times in each sequence? How many times in total?

  6. MEME Output (1) Like BLAST Motif length “Position-Specific Probability Matrix” = Motif Profile Number of times Most popular symbols Diversion of motif position from background

  7. MEME Output (2) Position in sequence Strength of match Sequence names Reverse complement (genomic input only) Motif within sequence

  8. MEME Output (3) Motif instance Original sequence lengths Overall strength of motif matches

  9. MAST • Searches for motifs (one or more) in sequence databases: • Like BLAST but motifs for input • Similar to iterations of PSI-BLAST • Profile defines strength of match • Multiple motif matches per sequence • Combined E value for all motifs • MEME uses MAST to summarize results: • Each MEME result is accompanied by the MAST result for searching the discovered motifs on the given sequences.

  10. MAST Input Email address Motif file (e.g. MEME output) Consider matched sequence length Database (like BLAST) E value threshold

  11. MAST Output (1) Link to GenBank Matched accession Match E value Length of sequence

  12. MAST Output (2) Motif diagram

  13. MAST Output (3) Position of each instance Matched parts of sequence Motif and orientation P value of instance Motif ‘consensus’

  14. TRANSFAC Database of eukaryotic DNA transcription regulation: • Individual regulatory sites (SITES table) • Genes to which they belong • Proteins which bind them • Proteins which bind sites (FACTORS table) • Cellular source of protein • Nucleotide motif profile for binding • Some grouping and classification • Classification of factors (CLASS table) • Position-specific matrices for select factors (MATRIX table) • Cell localization (CELL table)

  15. Searching TRANSFAC • www.gene-regulation.com • Search a single table • By identifier, factor name, gene name • By species, author • Browse your way from table to table • Search within a sequence • MatInspector, TFScan (EMBOSS package)

  16. TRANSFAC Factor (3) DT Date; author FA Factor name GE Encoding gene SF Structural features CP Cell specificity (positive) CN Cell specificity (negative) EX Expression pattern FF Functional features IN Interacting factors MX Matrix BS Binding SITE DR External databases References: RN Reference no. RX MEDLINE ID RA Reference authors RT Reference title RL Reference data

  17. TRANSFAC Matrix Accession Position Specific Matrix Concensus (IUPAC subset symbols) Statistical basis

  18. TRANSFAC complementary material at your free time

  19. TRANSFAC Site (1) DNA or RNA Accession number Sequence of regulatory element Gene Gene region Position range of factor binding site

  20. TRANSFAC Site (2) Binding factor accession Factor name Binding ‘quality’ Organism Cellular source External links Methods of identifying site

  21. TRANSFAC Factor (1) AC: Accession number FA: Factor name HO: Homologs SX: Other names CL: Classification OS: Organism OC: Taxonomy SX: Amino acid sequence SZ: Size

  22. TRANSFAC Factor (2) Protein sequence reference Features and positions Structural features Cell specificity

More Related