identification of transcription factor binding sites
Download
Skip this Video
Download Presentation
Identification of Transcription Factor Binding Sites

Loading in 2 Seconds...

play fullscreen
1 / 31

Identification of Transcription Factor Binding Sites - PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on

Identification of Transcription Factor Binding Sites. Lior Harpaz Ofer Shany 09/05/2004. Goal - find TFBS !. input. output. Importance. TF regulate gene expression. Identification of TF can teach us: Mapping of regulatory pathways Potential functions of genes. Experimental Methods.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Identification of Transcription Factor Binding Sites' - verity


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
importance
Importance
  • TF regulate gene expression.
  • Identification of TF can teach us:
    • Mapping of regulatory pathways
    • Potential functions of genes
experimental methods
Experimental Methods
  • Footprinting
  • EMSA - electrophoretic mobility shift assay

Problems:

  • Time consuming
  • Not scaled up to whole genomes
computational methods goals
Computational Methods - Goals
  • Identifying known TFBSs in previously unknown locations.
  • Identifying unknown TFBSs.
computational methods
Computational Methods
  • Basic idea - locate TFBS using sequence-searching

Problems:

  • Short sequences (5-15 bp)
  • Degenerate sequences
  • Location
  • Biological reality
computational methods1
Computational Methods

Possible solutions:

Conservation = functional importance

  • mRNA expression pattern
  • Phylogenetic footprinting
  • Network-level conservation
phylogenetic footprinting
Phylogenetic footprinting
  • Identify ortholog genes
  • Concentrate on conserved non-coding regions (possible regulatory regions)
  • Look for conserved motifs.
why should it work
Why should it work ?
  • 40% alignment between human and mice genome
  • 80% of mouse genes have orthologs in human genome
  • Only 1%-5% of human genome encodes proteins.
things to consider
=

?

Things to consider…
  • Choosing genomes.
  • Locating transcriptional start site.
  • Alignment method.
more things to consider
More things to consider…
  • Different evolution rates for different regions in the genome.
  • PSSM score cut-off
  • Note - TFBSs within ORFs are not detected.
phylogentetic footprinting in proteobacterial genomes
Phylogentetic footprinting in proteobacterial genomes
  • Study set of 190 genes of E.Coly with known TBFSs.
  • Orthologs were searched in eight other bacteria.
  • Motif search by Bayesian Gibbs sampling.
bayesian gibbs sampling
Bayesian Gibbs sampling
  • Algorithm for motif search.
  • Each motif is assigned with a MAP value.
bayesian gibbs sampling1
Bayesian Gibbs sampling
  • Parameters and extensions:
    • Model sequence
    • Palindromic patterns
    • Background pattern
    • Distribution of spacing between TFBSs and translation start site
results
Results
  • Overall – in 146/184 sets, motives matched known regulatory sequences.
  • In 18 genes (with 1 ortholog) only 67% known sites were matched, and with low MAP value.
  • In 166 sets (with >=2 orthologs) – 81% of motives matched known regulatory sequences.
results1
Results
  • Out of the 166 sets (with >= 2 orthologs):
    • 131 corresponded to known TFBSs.
    • 3 corresponded to known stem & loop structures.
    • 32 data sets contained predictions with large MAP value: could be undocumentd sites !
  • Documented site were found in 138 sites without using palindromic models.
identification of a new tf
Identification of a new TF
  • New site found near fabA, fabB & yqfA
  • YijC binds to these sites.
  • Site location, protein structure & previous experimental results suggests YijC is a repressor for the fab genes.
  • Indication of yqfA’s involvement in metabolism of fatty-acids.
genomic scale phylogenetic footprinting
Genomic scale phylogenetic footprinting
  • 2113 ORFs of E.coli used.
  • 187 new sites identified as probable sites for 46 known TFs.
  • Remaining sites are expected to represent unknown TFBSs
  • MAP Values of predicted sites were lower.
slide20
Study set

Ortholog Distribution

Full set

conclusions
Conclusions
  • New sites for known TF were found.
  • Conservation of Regulatory stem-loops.
  • New sites for unknown TF are predicted.
  • New TF identified (YijC).
  • Predicted gene function (yqfA).
network level conservation
Network level conservation
  • Each TF regulates the expression of many genes (20-400).
  • Conservation of global gene expression requires the conservation of regulatory mechanisms.
data analysis
Total motifs: 80,000

P-value filter: 12,000

Low-complexity filter: 7,673

Hierarchically clustering: 1,269

Data analysis
validation
34/48 known sites discovered.

Large fraction of matches for significant p-values.

Validation
biological significance
Biological Significance
  • Functional coherence
  • Expression coherence
characteristic features
Characteristic Features
  • Conservation of binding affinity
  • Conservation of position & orientation
references
References
  • Bulyk, M. Computational prediction of transcription-factor binding site locations. Genome Biol. 2003 5:201
  • McCue L, Thompson W, Carmack C, Ryan MP, Liu JS, Derbyshire V, Lawrence CE. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res. 2001 29:774-782.
  • Pritzker M, Liu YC, Beer MA, Tavazoie S. Whole-genome discovery transcription factor binding sites by network-level conservation. Genome Res. 2004 14:99-108
ad