Identification of transcription factor binding sites
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

Identification of Transcription Factor Binding Sites PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on
  • Presentation posted in: General

Identification of Transcription Factor Binding Sites. Lior Harpaz Ofer Shany 09/05/2004. Goal - find TFBS !. input. output. Importance. TF regulate gene expression. Identification of TF can teach us: Mapping of regulatory pathways Potential functions of genes. Experimental Methods.

Download Presentation

Identification of Transcription Factor Binding Sites

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Identification of transcription factor binding sites

Identification of Transcription Factor Binding Sites

Lior Harpaz

Ofer Shany

09/05/2004


Goal find tfbs

Goal - find TFBS !

input

output


Importance

Importance

  • TF regulate gene expression.

  • Identification of TF can teach us:

    • Mapping of regulatory pathways

    • Potential functions of genes


Experimental methods

Experimental Methods

  • Footprinting

  • EMSA - electrophoretic mobility shift assay

Problems:

  • Time consuming

  • Not scaled up to whole genomes


Computational methods goals

Computational Methods - Goals

  • Identifying known TFBSs in previously unknown locations.

  • Identifying unknown TFBSs.


Computational methods

Computational Methods

  • Basic idea - locate TFBS using sequence-searching

Problems:

  • Short sequences (5-15 bp)

  • Degenerate sequences

  • Location

  • Biological reality


Computational methods1

Computational Methods

Possible solutions:

Conservation = functional importance

  • mRNA expression pattern

  • Phylogenetic footprinting

  • Network-level conservation


Phylogenetic footprinting

Phylogenetic footprinting

  • Identify ortholog genes

  • Concentrate on conserved non-coding regions (possible regulatory regions)

  • Look for conserved motifs.


Why should it work

Why should it work ?

  • 40% alignment between human and mice genome

  • 80% of mouse genes have orthologs in human genome

  • Only 1%-5% of human genome encodes proteins.


Things to consider

=

?

Things to consider…

  • Choosing genomes.

  • Locating transcriptional start site.

  • Alignment method.


More things to consider

More things to consider…

  • Different evolution rates for different regions in the genome.

  • PSSM score cut-off

  • Note - TFBSs within ORFs are not detected.


Phylogentetic footprinting in proteobacterial genomes

Phylogentetic footprinting in proteobacterial genomes

  • Study set of 190 genes of E.Coly with known TBFSs.

  • Orthologs were searched in eight other bacteria.

  • Motif search by Bayesian Gibbs sampling.


Bayesian gibbs sampling

Bayesian Gibbs sampling

  • Algorithm for motif search.

  • Each motif is assigned with a MAP value.


Bayesian gibbs sampling1

Bayesian Gibbs sampling

  • Parameters and extensions:

    • Model sequence

    • Palindromic patterns

    • Background pattern

    • Distribution of spacing between TFBSs and translation start site


Results

Results

  • Overall – in 146/184 sets, motives matched known regulatory sequences.

  • In 18 genes (with 1 ortholog) only 67% known sites were matched, and with low MAP value.

  • In 166 sets (with >=2 orthologs) – 81% of motives matched known regulatory sequences.


Results1

Results

  • Out of the 166 sets (with >= 2 orthologs):

    • 131 corresponded to known TFBSs.

    • 3 corresponded to known stem & loop structures.

    • 32 data sets contained predictions with large MAP value: could be undocumentd sites !

  • Documented site were found in 138 sites without using palindromic models.


Identification of a new tf

Identification of a new TF

  • New site found near fabA, fabB & yqfA

  • YijC binds to these sites.

  • Site location, protein structure & previous experimental results suggests YijC is a repressor for the fab genes.

  • Indication of yqfA’s involvement in metabolism of fatty-acids.


Genomic scale phylogenetic footprinting

Genomic scale phylogenetic footprinting

  • 2113 ORFs of E.coli used.

  • 187 new sites identified as probable sites for 46 known TFs.

  • Remaining sites are expected to represent unknown TFBSs

  • MAP Values of predicted sites were lower.


Map values left shift

MAP values left-shift


Identification of transcription factor binding sites

Study set

Ortholog Distribution

Full set


Conclusions

Conclusions

  • New sites for known TF were found.

  • Conservation of Regulatory stem-loops.

  • New sites for unknown TF are predicted.

  • New TF identified (YijC).

  • Predicted gene function (yqfA).


Identification of transcription factor binding sites

הפסקה


Network level conservation

Network level conservation

  • Each TF regulates the expression of many genes (20-400).

  • Conservation of global gene expression requires the conservation of regulatory mechanisms.


Data analysis

Total motifs: 80,000

P-value filter: 12,000

Low-complexity filter: 7,673

Hierarchically clustering: 1,269

Data analysis


Validation

34/48 known sites discovered.

Large fraction of matches for significant p-values.

Validation


Identification of transcription factor binding sites

Identification of known binding sites


Biological significance

Biological Significance

  • Functional coherence

  • Expression coherence


Characteristic features

Characteristic Features

  • Conservation of binding affinity

  • Conservation of position & orientation


References

References

  • Bulyk, M. Computational prediction of transcription-factor binding site locations. Genome Biol. 2003 5:201

  • McCue L, Thompson W, Carmack C, Ryan MP, Liu JS, Derbyshire V, Lawrence CE. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res. 2001 29:774-782.

  • Pritzker M, Liu YC, Beer MA, Tavazoie S. Whole-genome discovery transcription factor binding sites by network-level conservation. Genome Res. 2004 14:99-108


Sensitivity vs specificity

Sensitivity Vs. Specificity


  • Login