review recomb satellite workshop on regulatory genomics
Skip this Video
Download Presentation
Review: RECOMB Satellite Workshop on Regulatory Genomics

Loading in 2 Seconds...

play fullscreen
1 / 51

Review: RECOMB Satellite Workshop on Regulatory Genomics - PowerPoint PPT Presentation

  • Uploaded on

Review: RECOMB Satellite Workshop on Regulatory Genomics. (Held March 26-27, 2004). Workshop Themes/Trends. More comprehensive evaluations of motif-detection algorithms Making more effective use of comparative mapping/ evolution data Models that explain rather than just describe

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Review: RECOMB Satellite Workshop on Regulatory Genomics' - bian

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
workshop themes trends
Workshop Themes/Trends
  • More comprehensive evaluations of motif-detection algorithms
  • Making more effective use of comparative mapping/evolution data
  • Models that explain rather than just describe
  • Moving from binding motifs to entire regulatory modules
  • Methods are simple not sophisticated
  • Jim Kadonaga, University of California, San DiegoThe MTE, a New Core Promoter Element for Transcription by RNA Polymerase II 
  • Rotem Sorek, Compugen and Tel Aviv UniversityThe "promoters" of splicing: Intronic sequences that regulate alternative splicing
  • Yitzhak Pilpel, Weizman InstituteRevealing the architecture of genetic backup circuits through inspection of transcription regulatory networks
  • Ron Shamir, Tel Aviv UniversityRevealing selection patterns in the evolution of yeast transcription regulation
  • Michael Eisen, Lawrence Berkeley National Lab Evolutionary Signatures of Regulatory Sequences
a new core promoter element for transcription by rna polymerase ii jim kadonaga
A New Core Promoter Element for Transcription by RNA Polymerase II(Jim Kadonaga)

The majority of transcription activity is regulated by sequence-specific DNA-binding factors, which are thus the focus of the bulk of current research on regulation, however...

The ultimate target of all of this action is the core promoter, which also plays a part in regulation

Core promoter
  • Encompasses TSS
  • Directs RNA polymerase II
  • Most well-known component is the TATA box

Core promoter

  • Encompasses TSS
  • Directs RNA polymerase II
  • Most well-known component is the TATA box

Only about 30-40% of promoters contain a TATA box!

What’s going on the rest of the time?

finding novel promoter elements
Finding Novel Promoter Elements
  • Experimentally investigated binding in those promoters with no TATA-box
    • found novel promoter element DPE
  • Large scale motif detection of 2000 core promoters in Drosophila (Ohler et al, 2002)
    • Plotted distance of top 10 motifs to TSS
      • four motifs had clear peak: TATA, Inr, DPE and ...
      • a novel promoter element MTE
the core promoter gets a new look
The Core Promoter gets a new look


Motif Ten Promoter Element

(Kadonaga, powerpoint slides)

dpe and mte two newly identified promoter elements
DPE and MTETwo newly Identified Promoter Elements
  • Conserved from Drosophila to human (unknown whether occur in yeast)
  • Very sensitive to spacing to Inr motif
    • experimentally found TSS (papers not reliable)
    • single insertion/delection between motifs causes 7-fold reduction in transcription
  • Inr and DPE (or MTE) bound cooperatively by TFIID
    • first step in transcription initiation
tata gets top billing but
TATA gets top billing but...
  • In Drosophila (out of 205 core promoters)
    • TATA and DPE: 14%
    • TATA only: 29%
    • DPE only: 26%
    • Neither: 31%
  • TATA, DPE, and MTE can all
    • independently support transcription
    • compensate for mutation in one other
and finally regulation
And finally... regulation.
  • NC2 previously known to repress TATA-dependent transcription; unexpectedly found to activate DPE-dependent transcripton
  • Studied 18 enhancers and estimate that about 25% exhibit some specificity for DPE or TATA
  • Similar work in progress for MTE
the promoters of splicing rotem sorek
The “Promoters” of Splicing (Rotem Sorek)

In general it is not known how alternative

splicing (AS) is regulated

  • A few known splicing regulatory proteins
    • like TFs they are sequence-specific, but they bind to RNA not DNA
    • binding motif (usually 4-10 nt) can be located in exon or intron
    • can act as enhancers or silencers
  • Evidence for combinatorial regulation
the typical motif in a haystack
The typical “motif in a haystack”
  • Most work on finding splicing factor motifs focuses on exons
    • short enough that mutation studies feasible
  • Introns too long, require a computational approach
  • Compiled training dataset
    • 250 AS exons, AS both in mouse/human
    • large set of constituitively spliced (CS) exons, conserved across human/mouse


Their Primary Finding:there tends to be significantly more conservation in introns surrounding AS exons than CS exons

On average about 100 bases on either side of each exon are conserved, compared to around 7 bases for constituitively spliced exons

What’s the explanation?

  • multiple binding motifs?
  • helping to determine secondary structure in RNA, which helps lead to correct splicing?
predicting alternative splicing
Predicting Alternative Splicing
  • Additional Predictive features
    • Higher conservation around exon
    • Higher conservation of exon itself (motifs?)
    • Shorter exons
    • Exons that are a multiple of 3
  • Method: somehow chose one threshold for each feature?
  • Performance: scanned human genome, predicted 1000 AS exons (incl training data?)
    • 70% had EST evidence of AS vs 6-7% baseline
    • Lab test showed that 7/15 (randomly?) selected from remaining 30% are AS in at least one of 15 tissues
  • Significance: estimate “splicing promoters” cover 3x10^6 bp
genetic backup circuits kafri and pilpel
Genetic Backup Circuits(Kafri and Pilpel)
  • Fact: single gene knockouts often have little or no phenotypic effect
    • 10% lethal in worm
    • 27% lethal in yeast
  • Question: Can we better understand the mechanisms of genetic backup?
  • Task: Predict whether a knockout will be lethal or not
duplicates suggest redundancy
Duplicates Suggest Redundancy
  • Genes with duplicates are less likely to be essential
  • But clearly this doesn’t tell the whole story
    • lethal genes can have duplicates
    • nonessential genes often have no duplicate

(Gu, Z. et al Nature 2003)

function of duplicate matters
Function of Duplicate Matters
  • Compute dispensability of yeast genes
    • growth rate after knockout compared to mean growth rate, averaged over many conditions
  • Compared GO functional annotations of highly similar genes. Found higher dispensability when
    • higher functional similarity (Resnik info content)
    • little functional similarity but high sequence similarity (Blast E-values)
similarity of expression
Similarity of Expression
  • 40 time series, 500 timepoints
  • In each condition calculated correlation of expression profiles of each pair of paralogous genes
  • Average correlation suggests
    • backup is best provided by genes which do not share expression patterns
how can we explain this unexpected result
How can we explain this unexpected result?

Classify pairs into:

  • negative correlation:
    • never similarly expressed
  • positive correlation:
    • always similarly expressed
  • no correlation:
    • never similarly expressed or
    • similarly expressed in certain conditions
variability of expression
Variability of Expression
  • Use stdDev to quantify consistency of correlation across conditions
goldilocks and the three little paralogs
Goldilocks and the three little paralogs


correlated in only

a subset of conditions

Just Right

Optimal backup requires the “ability to switch between similar and dissimilar expression in a condition dependent manner”

Always Same


Too Similar

Never Same


Too Diverged

predictions about the past
Predictions about the Past...

Hypothesized Duplication Mechanism

  • duplication occurs
  • leads to nonstable redundancy
  • quickly followed by either
    • mutation and loss of one of the duplicates
    • subfunctionalization leading to stable redundancy

Hypothesize two distinct types of subfunctionalization

  • mutation of coding region leading to functional divergence
  • mutation of control region leading to divergence of expression
need for regulatory flexibility
Need for Regulatory Flexibility
  • This second type of subfunctionalization would entail a quite significant regulatory challenge if the paralogs are to provide backup for one another
    • Upon mutation of B, A must be turned on in the conditions that would normally require B
  • Postulate that
    • this regulatory challenge is met when a gene has a significant amount of regulatory diversity (i.e. different TF motifs)
    • backup asymmetry arises when one of the genes has few motifs (Kellis suggests otherwise?)
experiments but no hard numbers
Experiments, but no hard numbers
  • Claim the capacity of genes to respond at the transcriptional level when their counterpart is deleted is central to their ability to provide backup
    • Most paralogs downregulated when other gene is knocked out (cross-hybridization?)
      • lower stdev -> down regulation
  • Claim that asymmetry of backup capability can be predicted based on number of transcription factor binding sites.
    • Gene that has the larger number of motifs is the one that is capable of providing a backup to the other
    • Genes with few motifs are “parasites” – can’t backup
  • Claim an improved ability to predict effect of double knockouts
a question
A Question
  • They claim that only when the genes diverge in function will they be maintained in evolution.
  • But if the duplicated pair can compensate for each other’s function then won’t there be little selection pressure to maintain both copies?
from general conservation to specific motifs
From General Conservation to Specific Motifs
  • Searched conserved intronic regions for overrepresented hexamer
    • literature search for most significant hexamer shows that hexamer mentioned as an AS motif in six papers
  • Next steps:
    • identify the consensus sequences of additional motifs
    • learn tissue/developmental specificity for each motif
Revealing Selection Patterns in the Evolution of Yeast Transcription Regulation(Amos Tanay, Irit Gat-Viks and Ron Shamir)
  • Identifying TF binding sites is hard
  • Even harder to predict more complex interactions
    • rarely a binary switch
    • not a linear relation between affinity and acivation
    • different binding affinities can lead to different results (e.g. P53 can lead to apoptosis or rescue)

Conservation indicates functionality

Evolution dynamics disclose details of functionality

an analogy imagine we didn t know the genetic code but just the length of the codes
An Analogy:Imagine we didn’t know the genetic code, but just the length of the codes

We know that synonymous substitutions are more common in coding regions than nonsynonymous substitutions

  • build a network where each 3-letter nt string is represented by one node
  • put an edge between nodes where the thickness of the edge represents the frequency of mutations in aligned coding regions of related organisms
  • see strongly connected components comprised of nodes which all code for the same amino acid
a simple approach
A “Simple” Approach
  • Chose to use the four recent genomes of “simple” yeasts (promoter regions are relatively short)
  • Identified 4000 promoters and aligned them using ClustalW
  • Use simple window scanning method to identify all “motifs” of size 8
  • Simple parsimony method to infer ancestral sequences at each node in the phylogeny
a simple approach 2
A Simple Approach (2)
  • Calculate background substitution rate
    • 16 parameter background model for each branch in phylogeny
  • For each motif, compute 8 tables of site-specific substitution rates
    • simply count observed substitutions at each site, summed over all branches of the tree and all instances of the motif
    • normalized substitution rate: log of ratio of observed substitutions over expected substitutions
building a selection network
Building a “Selection Network”
  • Each node represents an 8mer “motif”
  • Connect all motifs that are 1 substitution apart
    • if substitution rate is positive, dark edge
    • if substitution rate is negative, light edge
    • if not enough data, very thin edge

images taken from:

Did some larger scale evaluations based on ChiP and gene expression data
  • Also some anectodal results
evolutionary signatures of regulatory sequences michael eisen
Evolutionary Signatures of Regulatory Sequences (Michael Eisen)
  • Examples of “Evolutionary Signatures”
    • coding sequence: conserved conserved variable
    • structural RNA, nt that basepair are coevolving

What are the evolutionary constraints

imposed on sequences by TF binding?

  • Aligned 4 yeast species
    • for each base in genome, estimate evolutionary rate (very noisy estimates)
analyze the pattern of rate variation across the entire binding site
Analyze the pattern of rate variation across the entire binding site

Moses et al Evol Biol 2003

position specific rate variation
Position-specific Rate Variation
  • The pattern of rate variation across the entire binding site for a particular TF
    • within one genome
    • across genomes
position specific rate variation1
Position-specific Rate Variation
  • The pattern of rate variation across the entire binding site for a particular TF
    • within one genome
    • across genomes
  • Clearly due to structural constraints
    • protein contacts
    • even when we know there’s no contact, there’s DNA bending issues....

Highly Correlated

these signatures are missing from current motif prediction programs
These “signatures” are missing from current motif-prediction programs
  • Although this isn’t a particularly suprising result, many predicted motifs (e.g. from MEME etc.) do not display this TFBS “signature”
    • could use as a filter, or incorporate it more directly (they’re working on this currently?)
  • Different families of TF have different “signatures”
    • Eisen thinks the community is still underutilizing this information
make better use of comparative data by using an explicit evolutionary model
Make better use of comparative data by using an explicit evolutionary model
  • Is there likely to have been a TFBS in the ancestor?
    • build a PSSM representing the chemical contribution of each base to the binding specificity
    • use Halpern and Bruno model to predict how the TFBS will evolve given proposal + selection model
make better use of comparative data by using an explicit evolutionary model1
Make better use of comparative data by using an explicit evolutionary model

Moses et al Evol Biol 2003

larger cis regulatory sequences
Larger Cis-Regulatory Sequences
  • Known binding patterns in Drosophila have low information content
    • find a sequence match for each TFBS before almost every gene in the genome
  • Build a statistical model to identify significant clusters of binding sites in windows of arbitrary size
    • improved detection of cis-regulatory modules
    • experimental results still show many false positives
  • Use comparative data to discriminate real clusters from false ones
how to use comparative data
How to use comparative data
  • Conservation in Drosophila pseudoobscura isn’t a good indicator of functionality
    • all real and fake clusters have very high overall sequence conservation, including their flanking regions (a surprise)
  • However...
    • the actual binding sites are often not conserved
    • even one or two mutations can destroy a binding site

conservation of binding site density

is a useful indicator of function

an impassioned speech on the evolution of the scientific journal
An Impassioned Speech on the Evolution of the Scientific Journal
  • “If you publish [your work] in a journal like Sciencewhich fewer and fewer people in the world have access to you run a really big risk of being the next Mendel and that your work will languish in obscurity”
  • Don’t publish in a journal that “takes your writing, your ideas, thoughts and paper and claims ownership of them and then only doles them out to a relatively narrow bunch of people who have enough money to pay for them..solely to promote the financial health of the journal...”
  • Don’t be “like Microsoft”... publish in Public Library of Science or another freely available journal
for more information
For More Information
  • Most of the talks I picked were invited talks
  • For the workshop there there is often only an abstract
  • Video feed is available online:
  • Many have papers that have just come out or are about to come out with additional details... check the authors’ webpages
evolution and larger cis regulatory sequences
Evolution and Larger Cis-Regulatory Sequences
  • what are enhancer? whole regions of binding sites?
  • how are Drosophila enhancers organized
  • only 5 binding sites whose specificities are well characterized from experim. studies
    • low information content
    • find them all over the genome
  • Clusters of binding sites -> Surrogate for regulatory function
  • Shown previously that if look for clusters of these sites
    • all identified regions overlap known enhancers
    • don’t find anything else
    • then I don’t understand next study with 39 clusters
Found 39 clusters
    • 9 overlap known enhancers
    • 28 tested experimentally
      • 6 clearly regulating nearby gene
      • 3 shown some regulatory role perhaps
      • remainder don’t appear to be real (but could have wrong promoter? look back at donoga talk)
  • What’s difference between real and fake?
    • use comparative mapping
Used two flies (which ones)
    • distant enough based on coding region conservation that expect to see conservation only of funtionally conserved regions
    • not the case
    • all real and fake clusters have very high overall sequence conservation, including their flanking regions (why?)
    • binding sites not conserved
    • one or two mutaitons enough to destroy a binding site
    • measure conservation of binding site density
    • show graph (37:18)
    • summary (39:21)
  • In more distantly related species
    • alignment more of an issue
    • binding sites will move around more
    • been shown that huge binding site turnover– will have 2 separate ways to make the same enhancer
    • no sequence identity but in experimental studies can replace each other?