Identification of transcription factor binding sites
1 / 31

Identification of Transcription Factor Binding Sites - PowerPoint PPT Presentation

  • Uploaded on

Identification of Transcription Factor Binding Sites. Lior Harpaz Ofer Shany 09/05/2004. Goal - find TFBS !. input. output. Importance. TF regulate gene expression. Identification of TF can teach us: Mapping of regulatory pathways Potential functions of genes. Experimental Methods.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Identification of Transcription Factor Binding Sites' - verity

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Identification of transcription factor binding sites

Identification of Transcription Factor Binding Sites

Lior Harpaz

Ofer Shany


Goal find tfbs
Goal - find TFBS !




  • TF regulate gene expression.

  • Identification of TF can teach us:

    • Mapping of regulatory pathways

    • Potential functions of genes

Experimental methods
Experimental Methods

  • Footprinting

  • EMSA - electrophoretic mobility shift assay


  • Time consuming

  • Not scaled up to whole genomes

Computational methods goals
Computational Methods - Goals

  • Identifying known TFBSs in previously unknown locations.

  • Identifying unknown TFBSs.

Computational methods
Computational Methods

  • Basic idea - locate TFBS using sequence-searching


  • Short sequences (5-15 bp)

  • Degenerate sequences

  • Location

  • Biological reality

Computational methods1
Computational Methods

Possible solutions:

Conservation = functional importance

  • mRNA expression pattern

  • Phylogenetic footprinting

  • Network-level conservation

Phylogenetic footprinting
Phylogenetic footprinting

  • Identify ortholog genes

  • Concentrate on conserved non-coding regions (possible regulatory regions)

  • Look for conserved motifs.

Why should it work
Why should it work ?

  • 40% alignment between human and mice genome

  • 80% of mouse genes have orthologs in human genome

  • Only 1%-5% of human genome encodes proteins.

Things to consider



Things to consider…

  • Choosing genomes.

  • Locating transcriptional start site.

  • Alignment method.

More things to consider
More things to consider

  • Different evolution rates for different regions in the genome.

  • PSSM score cut-off

  • Note - TFBSs within ORFs are not detected.

Phylogentetic footprinting in proteobacterial genomes
Phylogentetic footprinting in proteobacterial genomes

  • Study set of 190 genes of E.Coly with known TBFSs.

  • Orthologs were searched in eight other bacteria.

  • Motif search by Bayesian Gibbs sampling.

Bayesian gibbs sampling
Bayesian Gibbs sampling

  • Algorithm for motif search.

  • Each motif is assigned with a MAP value.

Bayesian gibbs sampling1
Bayesian Gibbs sampling

  • Parameters and extensions:

    • Model sequence

    • Palindromic patterns

    • Background pattern

    • Distribution of spacing between TFBSs and translation start site


  • Overall – in 146/184 sets, motives matched known regulatory sequences.

  • In 18 genes (with 1 ortholog) only 67% known sites were matched, and with low MAP value.

  • In 166 sets (with >=2 orthologs) – 81% of motives matched known regulatory sequences.


  • Out of the 166 sets (with >= 2 orthologs):

    • 131 corresponded to known TFBSs.

    • 3 corresponded to known stem & loop structures.

    • 32 data sets contained predictions with large MAP value: could be undocumentd sites !

  • Documented site were found in 138 sites without using palindromic models.

Identification of a new tf
Identification of a new TF

  • New site found near fabA, fabB & yqfA

  • YijC binds to these sites.

  • Site location, protein structure & previous experimental results suggests YijC is a repressor for the fab genes.

  • Indication of yqfA’s involvement in metabolism of fatty-acids.

Genomic scale phylogenetic footprinting
Genomic scale phylogenetic footprinting

  • 2113 ORFs of E.coli used.

  • 187 new sites identified as probable sites for 46 known TFs.

  • Remaining sites are expected to represent unknown TFBSs

  • MAP Values of predicted sites were lower.

Study set

Ortholog Distribution

Full set


  • New sites for known TF were found.

  • Conservation of Regulatory stem-loops.

  • New sites for unknown TF are predicted.

  • New TF identified (YijC).

  • Predicted gene function (yqfA).

Network level conservation
Network level conservation

  • Each TF regulates the expression of many genes (20-400).

  • Conservation of global gene expression requires the conservation of regulatory mechanisms.

Data analysis

Total motifs: 80,000

P-value filter: 12,000

Low-complexity filter: 7,673

Hierarchically clustering: 1,269

Data analysis


34/48 known sites discovered.

Large fraction of matches for significant p-values.


Biological significance
Biological Significance

  • Functional coherence

  • Expression coherence

Characteristic features
Characteristic Features

  • Conservation of binding affinity

  • Conservation of position & orientation


  • Bulyk, M. Computational prediction of transcription-factor binding site locations. Genome Biol. 2003 5:201

  • McCue L, Thompson W, Carmack C, Ryan MP, Liu JS, Derbyshire V, Lawrence CE. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res. 2001 29:774-782.

  • Pritzker M, Liu YC, Beer MA, Tavazoie S. Whole-genome discovery transcription factor binding sites by network-level conservation. Genome Res. 2004 14:99-108