Bioinformatics
This presentation is the property of its rightful owner.
Sponsored Links
1 / 34

Bioinformatics PowerPoint PPT Presentation


  • 54 Views
  • Uploaded on
  • Presentation posted in: General

Bioinformatics. Motif Detection Revised 27/10/06. Overview. Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding Motif representation Algorithm Search Space Word counting methods Probabilistic methods Profile Searches Introduction Exercises.

Download Presentation

Bioinformatics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Bioinformatics

Bioinformatics

Motif Detection

Revised 27/10/06


Bioinformatics

Overview

  • Introduction Multiple Alignments

  • Multiple alignment based on HMM

  • Motif Finding

    • Motif representation

    • Algorithm

    • Search Space

    • Word counting methods

    • Probabilistic methods

  • Profile Searches

    • Introduction

  • Exercises

http://www.esat.kuleuven.ac.be/~kmarchal/


Bioinformatics

Introduction

  • Global multiple alignment (ClustalW)

    • Proteins, nucleotides

    • Long stretches of conservation essential

    • Identification of protein family profiles

    • Score gaps

  • Local multiple alignments (motif detection)

    • Proteins, nucleotides

    • Short stretches of conservation (12 NT, 6 AA)

    • Identification of regulatory motifs (DNA, protein)

    • No explicit gap scoring

    • Explicit use of a profile


Bioinformatics

Overview

  • Introduction Multiple Alignments

  • Multiple alignment based on HMM

  • Motif Finding

    • Motif representation

    • Algorithm

    • Search Space

    • Word counting methods

    • Probabilistic methods

  • Profile Searches

    • Introduction

  • Exercises


Bioinformatics

HMM


Bioinformatics

Overview

  • Introduction Multiple Alignments

  • Multiple alignment based on HMM

  • Motif Finding

    • Motif representation

    • Algorithm

    • Search Space

    • Word counting methods

    • Probabilistic methods

  • Profile Searches

    • Introduction

  • Exercises

http://www.esat.kuleuven.ac.be/~kmarchal/


Bioinformatics

signal

cell

chromosome

sigma

motif

Gene 1

Gene 2

Gene 3

Gene 4

gene

transcription

?

mRNA

translation

protein

Transcriptional regulation


Transcriptional regulation

Transcriptional regulation


Bioinformatics

Motif Representation

Consensus sequence:

  • reductionistic representation of a motif

  • Most frequent instance is used as a representative

  • Loss of information

    Regular expression:

  • More complex representation allowing motif degeneracy

    Position specific scoring matrix (PSSM):

  • Probabilistic representation


Bioinformatics

Motif Representation

Consensus

CTTAATATTAACTTAAT

Regular expression

CTTAAKRTTMAYTTAAT

PSSM (motif logo)


Motif representation

Motif Representation


Bioinformatics

Overview Algorithms

Search for motifs that are present more frequently in a set of sequences than in a set of unrelated sequences

  • Methods based on word counting (regular expression)

    • NP problems, heuristic methods clever algorithms

      • motif w=8; combinations=8!

      • Jensen & Knudsen, 2000; van Helden, 2000; Vanet, 2000

  • Probabilistic methods (weight matrix)

    • Multiple alignment by locally aligning small conserved regions in a set of unaligned sequences.

    • Motif model represented by a probability matrix

    • EM, Gibbs sampler (optimization algorithms)

      • AlignACE http://atlas.med.harvard.edu/

      • BioProspector: http://bioprospector.stanford.edu/

      • Motif Sampler http://www.esat.kuleuven.ac.be/~dna/BioI/Software.html


  • Bioinformatics

    Search space

    • When are motifs overrepresented statistically?

    • Set of coexpressed (coregulated sequences)

      • Literature searches

      • Microarrays, expression profiling

    • Set of orthologous sequences (phylogenetic footprinting)

      • Comparative genomics

      • Orthologous sequences similar ancestral origin => similar mechanism of transcriptional regulation


    Coexpression

    Motif finding

    coexpression

    Search space

    Preprocessing of the data

    cDNA arrays

    Clustering

    Upstream regions

    Gibbs

    sampling

    EMBL

    BLAST


    Phylogenetic footprinting

    Phylogenetic footprinting

    Search space

    • PhoPQ ubiquitous system

      • Salmonella

      • Escherichia

      • Yersinia

      • Vibrio

      • Pseudomonas

      • Providencia

      • Pectobacterium

  • PhoPQ is autoregulated


  • Search space

    Search space


    Bioinformatics

    Overview Algorithms

    • Methods based on word counting

      • NP problems, heuristic methods clever algorithms

        • motif w=8; combinations=8!

        • Jensen & Knudsen, 2000; van Helden, 2000; Vanet, 2000

  • Probabilistic methods

    • Optimisation problems, self learning, AI

    • Motif model represented by a probability matrix

    • Bayesian, Gibbs sampler

      • AlignACE http://atlas.med.harvard.edu/

      • BioProspector: http://bioprospector.stanford.edu/

      • Motif Sampler http://www.esat.kuleuven.ac.be/~dna/BioI/Software.html


  • Bioinformatics

    Word Counting

    Monad frequencies: single word counts:

    (RSA tools) (J. Vanhelden et al., 1998 J. Mol. Biol.)

    • Enumerate all oligonucleotides

    • count the number of occurrences of all oligonucleotides of selected size in a set of coregulated genes┬á

    • compare the number of occurrences with its expected value in the background

    http://bio.cigb.edu.cu/jvanheld/rsa-tools/RSA_home.shtml


    Bioinformatics

    Word Counting

    Relevance of the motifs detected

    p-Value and Sig score (string based methods)

    • Expected number of occurrences in background

    • Statistical significance


    Bioinformatics

    Probabilistic Algorithms

    • Methods based on word counting

      • NP problems, heuristic methods clever algorithms

        • motif w=8; combinations=8!

        • Jensen & Knudsen, 2000; van Helden, 2000; Vanet, 2000

  • Probabilistic methods

    • Optimisation problems, self learning, AI

    • Motif model represented by a probability matrix

    • Bayesian, Gibbs sampler

      • AlignACE http://atlas.med.harvard.edu/

      • BioProspector: http://bioprospector.stanford.edu/

      • Motif Sampler http://www.esat.kuleuven.ac.be/~dna/BioI/Software.html


  • Bioinformatics

    Probabilistic Algorithms

    Find common motifs, that represent regulatory elements, in the region upstream of translation start in a set of co-expressed DNA sequences

    • Motifs are hidden in background sequence


    Bioinformatics

    Probabilistic Algorithms

    • Motif Representation: Probability matrix (PSSM)

    • Background model

      • Single nucleotide frequencies

      • Described by an mth order Markov process, that can be represented by a transition matrix


    Bioinformatics

    Probabilistic Algorithms

    Step 1:Initialization of alignment vectorA (predictive update)

    j

    1

    i

    n

    Step 2: Calculate motif model for all sequences except one

    G A A T T

    C A T G T

    C A C T T

    C A T T G


    Bioinformatics

    GAATTATCGTGAATGCGTGGT

    Probabilistic Algorithms

    • Step 3 (expectation):

      • Select remaining sequence

      • For each window (site) calculate the probability that the sequence in the window is generated by the motif model versus the probability that it is generated by the background model

    1

    i

    n

    P(S|M) = 0.0098 x 0.0097 x 0.495 x 0.0098 x 0.245

    P(S|B) =

    • Assign weight based on this score to this site


    Probabilistic algorithms

    Probabilistic Algorithms

    Step 4 (Maximization):

    • Re-estimate new positions based on the weights calculated in step 3

      • Go to step 1

    j

    j

    1

    1

    i

    i

    n

    n

    • Re-iterate until stable motifs are found


    Bioinformatics

    Probabilistic Algorithms

    • local optima

      • EM update alignment vector:

        • Select positions with highest score

        • Deterministic output but local minimum

    • global optimum

      • Gibbs sampling

        • Select positions according to probability distribution

        • Stochastic output:

          • i.e. result differs each time the algorithm runs

          • allows to detect stable motifs

          • statistical analysis describes quality of the motif detected


    Bioinformatics

    Probabilistic Algorithms

    • Influence of the background model:e.g. p(ATCGT|Bm)=p(AT)p(C|AT)p(G|TC)p(T|CG)

    • Compensates for motifs that occur frequently because of the general background composition

    • Makes the outcome of the algorithm more robust


    Bioinformatics

    Probabilistic Algorithms

    Two organisms with similar background model

    Two organisms with different background model


    Bioinformatics

    Probabilistic Algorithms

    Motif scores for probabilistic motif finding algorithms

    • Information content (Consensus score)

    • Entropy

    • Relative entropy (Information content)

    • Log likelihood


    Result bacterial o 2 responsive element fnr

    Result: bacterial O2 responsive element FNR

    Probabilistic Algorithms

    Does only take into account the degree of conservation

    Takes into account the background model

    Tradeoff between the degree of conservation and the number of occurrences


    Bioinformatics

    Overview

    • Introduction Multiple Alignments

    • Multiple alignment based on HMM

    • Motif Finding

      • Motif representation

      • Algorithm

      • Search Space

      • Word counting methods

      • Probabilistic methods

    • Profile Searches

      • Introduction

    • Exercises


    Bioinformatics

    Profile Search


    Bioinformatics

    Profile Search

    • GENOMICS

      • Genomic sequence data

    EXPERIMENTAL

    High throughput measurements

    Literature

    • 1. Microarray Datamining

    • Preprocessing

    • Clustering

    • 3. Comparative Genomics

    • Genomewide Screening

    • Phylogenetic Footprinting

    Clusters of coexpressed genes

    Novel targets

    Novel Conditions

    • 2. Sequence Datamining

    • Motif Detection

    Summarized information

    Target Identification


  • Login