Review recomb satellite workshop on regulatory genomics
Download
1 / 51

Review: RECOMB Satellite Workshop on Regulatory Genomics - PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on

Review: RECOMB Satellite Workshop on Regulatory Genomics. (Held March 26-27, 2004). Workshop Themes/Trends. More comprehensive evaluations of motif-detection algorithms Making more effective use of comparative mapping/ evolution data Models that explain rather than just describe

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Review: RECOMB Satellite Workshop on Regulatory Genomics' - bian


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Workshop themes trends
Workshop Themes/Trends

  • More comprehensive evaluations of motif-detection algorithms

  • Making more effective use of comparative mapping/evolution data

  • Models that explain rather than just describe

  • Moving from binding motifs to entire regulatory modules

  • Methods are simple not sophisticated


Outline
Outline

  • Jim Kadonaga, University of California, San DiegoThe MTE, a New Core Promoter Element for Transcription by RNA Polymerase II 

  • Rotem Sorek, Compugen and Tel Aviv UniversityThe "promoters" of splicing: Intronic sequences that regulate alternative splicing

  • Yitzhak Pilpel, Weizman InstituteRevealing the architecture of genetic backup circuits through inspection of transcription regulatory networks

  • Ron Shamir, Tel Aviv UniversityRevealing selection patterns in the evolution of yeast transcription regulation

  • Michael Eisen, Lawrence Berkeley National Lab Evolutionary Signatures of Regulatory Sequences


A new core promoter element for transcription by rna polymerase ii jim kadonaga
A New Core Promoter Element for Transcription by RNA Polymerase II(Jim Kadonaga)

The majority of transcription activity is regulated by sequence-specific DNA-binding factors, which are thus the focus of the bulk of current research on regulation, however...

The ultimate target of all of this action is the core promoter, which also plays a part in regulation


Review recomb satellite workshop on regulatory genomics

Core promoter Polymerase II

  • Encompasses TSS

  • Directs RNA polymerase II

  • Most well-known component is the TATA box


Review recomb satellite workshop on regulatory genomics

Core promoter Polymerase II

  • Encompasses TSS

  • Directs RNA polymerase II

  • Most well-known component is the TATA box

Only about 30-40% of promoters contain a TATA box!

What’s going on the rest of the time?


Finding novel promoter elements
Finding Novel Promoter Elements Polymerase II

  • Experimentally investigated binding in those promoters with no TATA-box

    • found novel promoter element DPE

  • Large scale motif detection of 2000 core promoters in Drosophila (Ohler et al, 2002)

    • Plotted distance of top 10 motifs to TSS

      • four motifs had clear peak: TATA, Inr, DPE and ...

      • a novel promoter element MTE


The core promoter gets a new look
The Core Promoter gets a new look Polymerase II

MTE

Motif Ten Promoter Element

(Kadonaga, powerpoint slides)


Dpe and mte two newly identified promoter elements
DPE and MTE Polymerase IITwo newly Identified Promoter Elements

  • Conserved from Drosophila to human (unknown whether occur in yeast)

  • Very sensitive to spacing to Inr motif

    • experimentally found TSS (papers not reliable)

    • single insertion/delection between motifs causes 7-fold reduction in transcription

  • Inr and DPE (or MTE) bound cooperatively by TFIID

    • first step in transcription initiation


Tata gets top billing but
TATA gets top billing but... Polymerase II

  • In Drosophila (out of 205 core promoters)

    • TATA and DPE: 14%

    • TATA only: 29%

    • DPE only: 26%

    • Neither: 31%

  • TATA, DPE, and MTE can all

    • independently support transcription

    • compensate for mutation in one other


And finally regulation
And finally... regulation. Polymerase II

  • NC2 previously known to repress TATA-dependent transcription; unexpectedly found to activate DPE-dependent transcripton

  • Studied 18 enhancers and estimate that about 25% exhibit some specificity for DPE or TATA

  • Similar work in progress for MTE


The promoters of splicing rotem sorek
The “Promoters” of Splicing Polymerase II(Rotem Sorek)

In general it is not known how alternative

splicing (AS) is regulated

  • A few known splicing regulatory proteins

    • like TFs they are sequence-specific, but they bind to RNA not DNA

    • binding motif (usually 4-10 nt) can be located in exon or intron

    • can act as enhancers or silencers

  • Evidence for combinatorial regulation


The typical motif in a haystack
The typical Polymerase II“motif in a haystack”

  • Most work on finding splicing factor motifs focuses on exons

    • short enough that mutation studies feasible

  • Introns too long, require a computational approach

  • Compiled training dataset

    • 250 AS exons, AS both in mouse/human

    • large set of constituitively spliced (CS) exons, conserved across human/mouse

ATTCA



Review recomb satellite workshop on regulatory genomics

Their Primary Finding: Polymerase IIthere tends to be significantly more conservation in introns surrounding AS exons than CS exons

On average about 100 bases on either side of each exon are conserved, compared to around 7 bases for constituitively spliced exons

What’s the explanation?

  • multiple binding motifs?

  • helping to determine secondary structure in RNA, which helps lead to correct splicing?


Predicting alternative splicing
Predicting Alternative Splicing Polymerase II

  • Additional Predictive features

    • Higher conservation around exon

    • Higher conservation of exon itself (motifs?)

    • Shorter exons

    • Exons that are a multiple of 3

  • Method: somehow chose one threshold for each feature?

  • Performance: scanned human genome, predicted 1000 AS exons (incl training data?)

    • 70% had EST evidence of AS vs 6-7% baseline

    • Lab test showed that 7/15 (randomly?) selected from remaining 30% are AS in at least one of 15 tissues

  • Significance: estimate “splicing promoters” cover 3x10^6 bp


Genetic backup circuits kafri and pilpel
Genetic Backup Circuits Polymerase II(Kafri and Pilpel)

  • Fact: single gene knockouts often have little or no phenotypic effect

    • 10% lethal in worm

    • 27% lethal in yeast

  • Question: Can we better understand the mechanisms of genetic backup?

  • Task: Predict whether a knockout will be lethal or not


Duplicates suggest redundancy
Duplicates Suggest Redundancy Polymerase II

  • Genes with duplicates are less likely to be essential

  • But clearly this doesn’t tell the whole story

    • lethal genes can have duplicates

    • nonessential genes often have no duplicate

(Gu, Z. et al Nature 2003)


Function of duplicate matters
Function of Duplicate Matters Polymerase II

  • Compute dispensability of yeast genes

    • growth rate after knockout compared to mean growth rate, averaged over many conditions

  • Compared GO functional annotations of highly similar genes. Found higher dispensability when

    • higher functional similarity (Resnik info content)

    • little functional similarity but high sequence similarity (Blast E-values)


Similarity of expression
Similarity of Expression Polymerase II

  • 40 time series, 500 timepoints

  • In each condition calculated correlation of expression profiles of each pair of paralogous genes

  • Average correlation suggests

    • backup is best provided by genes which do not share expression patterns


How can we explain this unexpected result
How can we explain this unexpected result? Polymerase II

Classify pairs into:

  • negative correlation:

    • never similarly expressed

  • positive correlation:

    • always similarly expressed

  • no correlation:

    • never similarly expressed or

    • similarly expressed in certain conditions


Variability of expression
Variability of Expression Polymerase II

  • Use stdDev to quantify consistency of correlation across conditions


Goldilocks and the three little paralogs
Goldilocks and the three little paralogs Polymerase II

Expression

correlated in only

a subset of conditions

Just Right

Optimal backup requires the “ability to switch between similar and dissimilar expression in a condition dependent manner”

Always Same

Expression

Too Similar

Never Same

Expression

Too Diverged


Predictions about the past
Predictions about the Past... Polymerase II

Hypothesized Duplication Mechanism

  • duplication occurs

  • leads to nonstable redundancy

  • quickly followed by either

    • mutation and loss of one of the duplicates

    • subfunctionalization leading to stable redundancy

      Hypothesize two distinct types of subfunctionalization

  • mutation of coding region leading to functional divergence

  • mutation of control region leading to divergence of expression


Need for regulatory flexibility
Need for Regulatory Flexibility Polymerase II

  • This second type of subfunctionalization would entail a quite significant regulatory challenge if the paralogs are to provide backup for one another

    • Upon mutation of B, A must be turned on in the conditions that would normally require B

  • Postulate that

    • this regulatory challenge is met when a gene has a significant amount of regulatory diversity (i.e. different TF motifs)

    • backup asymmetry arises when one of the genes has few motifs (Kellis suggests otherwise?)


Experiments but no hard numbers
Experiments, but no hard numbers Polymerase II

  • Claim the capacity of genes to respond at the transcriptional level when their counterpart is deleted is central to their ability to provide backup

    • Most paralogs downregulated when other gene is knocked out (cross-hybridization?)

      • lower stdev -> down regulation

  • Claim that asymmetry of backup capability can be predicted based on number of transcription factor binding sites.

    • Gene that has the larger number of motifs is the one that is capable of providing a backup to the other

    • Genes with few motifs are “parasites” – can’t backup

  • Claim an improved ability to predict effect of double knockouts


A question
A Question Polymerase II

  • They claim that only when the genes diverge in function will they be maintained in evolution.

  • But if the duplicated pair can compensate for each other’s function then won’t there be little selection pressure to maintain both copies?


From general conservation to specific motifs
From General Conservation to Specific Motifs Polymerase II

  • Searched conserved intronic regions for overrepresented hexamer

    • literature search for most significant hexamer shows that hexamer mentioned as an AS motif in six papers

  • Next steps:

    • identify the consensus sequences of additional motifs

    • learn tissue/developmental specificity for each motif


Review recomb satellite workshop on regulatory genomics
Revealing Selection Patterns in the Evolution of Yeast Transcription Regulation(Amos Tanay, Irit Gat-Viks and Ron Shamir)

  • Identifying TF binding sites is hard

  • Even harder to predict more complex interactions

    • rarely a binary switch

    • not a linear relation between affinity and acivation

    • different binding affinities can lead to different results (e.g. P53 can lead to apoptosis or rescue)

      Conservation indicates functionality

      Evolution dynamics disclose details of functionality


An analogy imagine we didn t know the genetic code but just the length of the codes
An Analogy: Transcription RegulationImagine we didn’t know the genetic code, but just the length of the codes

We know that synonymous substitutions are more common in coding regions than nonsynonymous substitutions

  • build a network where each 3-letter nt string is represented by one node

  • put an edge between nodes where the thickness of the edge represents the frequency of mutations in aligned coding regions of related organisms

  • see strongly connected components comprised of nodes which all code for the same amino acid


A simple approach
A “Simple” Approach Transcription Regulation

  • Chose to use the four recent genomes of “simple” yeasts (promoter regions are relatively short)

  • Identified 4000 promoters and aligned them using ClustalW

  • Use simple window scanning method to identify all “motifs” of size 8

  • Simple parsimony method to infer ancestral sequences at each node in the phylogeny


A simple approach 2
A Simple Approach (2) Transcription Regulation

  • Calculate background substitution rate

    • 16 parameter background model for each branch in phylogeny

  • For each motif, compute 8 tables of site-specific substitution rates

    • simply count observed substitutions at each site, summed over all branches of the tree and all instances of the motif

    • normalized substitution rate: log of ratio of observed substitutions over expected substitutions


Building a selection network
Building a “Selection Network” Transcription Regulation

  • Each node represents an 8mer “motif”

  • Connect all motifs that are 1 substitution apart

    • if substitution rate is positive, dark edge

    • if substitution rate is negative, light edge

    • if not enough data, very thin edge


Review recomb satellite workshop on regulatory genomics

images taken from: http://www.cs.tau.ac.il/~amos/promoter_evo/


Review recomb satellite workshop on regulatory genomics



Evolutionary signatures of regulatory sequences michael eisen
Evolutionary Signatures of Regulatory Sequences expression data (Michael Eisen)

  • Examples of “Evolutionary Signatures”

    • coding sequence: conserved conserved variable

    • structural RNA, nt that basepair are coevolving

      What are the evolutionary constraints

      imposed on sequences by TF binding?

  • Aligned 4 yeast species

    • for each base in genome, estimate evolutionary rate (very noisy estimates)


Analyze the pattern of rate variation across the entire binding site
Analyze the pattern of rate variation across the entire binding site

Moses et al Evol Biol 2003


Position specific rate variation
Position-specific Rate Variation binding site

  • The pattern of rate variation across the entire binding site for a particular TF

    • within one genome

    • across genomes


Position specific rate variation1
Position-specific Rate Variation binding site

  • The pattern of rate variation across the entire binding site for a particular TF

    • within one genome

    • across genomes

  • Clearly due to structural constraints

    • protein contacts

    • even when we know there’s no contact, there’s DNA bending issues....

Highly Correlated


These signatures are missing from current motif prediction programs
These “signatures” are missing from current motif-prediction programs

  • Although this isn’t a particularly suprising result, many predicted motifs (e.g. from MEME etc.) do not display this TFBS “signature”

    • could use as a filter, or incorporate it more directly (they’re working on this currently?)

  • Different families of TF have different “signatures”

    • Eisen thinks the community is still underutilizing this information


Make better use of comparative data by using an explicit evolutionary model
Make better use of comparative data by using an explicit evolutionary model

  • Is there likely to have been a TFBS in the ancestor?

    • build a PSSM representing the chemical contribution of each base to the binding specificity

    • use Halpern and Bruno model to predict how the TFBS will evolve given proposal + selection model


Make better use of comparative data by using an explicit evolutionary model1
Make better use of comparative data by using an explicit evolutionary model

Moses et al Evol Biol 2003


Larger cis regulatory sequences
Larger Cis-Regulatory evolutionary modelSequences

  • Known binding patterns in Drosophila have low information content

    • find a sequence match for each TFBS before almost every gene in the genome

  • Build a statistical model to identify significant clusters of binding sites in windows of arbitrary size

    • improved detection of cis-regulatory modules

    • experimental results still show many false positives

  • Use comparative data to discriminate real clusters from false ones


How to use comparative data
How to use comparative data evolutionary model

  • Conservation in Drosophila pseudoobscura isn’t a good indicator of functionality

    • all real and fake clusters have very high overall sequence conservation, including their flanking regions (a surprise)

  • However...

    • the actual binding sites are often not conserved

    • even one or two mutations can destroy a binding site

      conservation of binding site density

      is a useful indicator of function


An impassioned speech on the evolution of the scientific journal
An Impassioned Speech on the Evolution of the Scientific Journal

  • “If you publish [your work] in a journal like Sciencewhich fewer and fewer people in the world have access to you run a really big risk of being the next Mendel and that your work will languish in obscurity”

  • Don’t publish in a journal that “takes your writing, your ideas, thoughts and paper and claims ownership of them and then only doles them out to a relatively narrow bunch of people who have enough money to pay for them..solely to promote the financial health of the journal...”

  • Don’t be “like Microsoft”... publish in Public Library of Science or another freely available journal


For more information
For More Information Journal

  • Most of the talks I picked were invited talks

  • For the workshop there there is often only an abstract

  • Video feed is available online: http://www.calit2.net/multimedia/recomb2004videos.html

  • Many have papers that have just come out or are about to come out with additional details... check the authors’ webpages


Evolution and larger cis regulatory sequences
Evolution and Larger Cis-Regulatory Sequences Journal

  • what are enhancer? whole regions of binding sites?

  • how are Drosophila enhancers organized

  • only 5 binding sites whose specificities are well characterized from experim. studies

    • low information content

    • find them all over the genome

  • Clusters of binding sites -> Surrogate for regulatory function

  • Shown previously that if look for clusters of these sites

    • all identified regions overlap known enhancers

    • don’t find anything else

    • then I don’t understand next study with 39 clusters


Review recomb satellite workshop on regulatory genomics

  • Found 39 clusters Journal

    • 9 overlap known enhancers

    • 28 tested experimentally

      • 6 clearly regulating nearby gene

      • 3 shown some regulatory role perhaps

      • remainder don’t appear to be real (but could have wrong promoter? look back at donoga talk)

  • What’s difference between real and fake?

    • use comparative mapping


Review recomb satellite workshop on regulatory genomics

  • Used two flies (which ones) Journal

    • distant enough based on coding region conservation that expect to see conservation only of funtionally conserved regions

    • not the case

    • all real and fake clusters have very high overall sequence conservation, including their flanking regions (why?)


Review recomb satellite workshop on regulatory genomics

  • However, Journal

    • binding sites not conserved

    • one or two mutaitons enough to destroy a binding site

    • measure conservation of binding site density

    • show graph (37:18)

    • summary (39:21)

  • In more distantly related species

    • alignment more of an issue

    • binding sites will move around more

    • been shown that huge binding site turnover– will have 2 separate ways to make the same enhancer

    • no sequence identity but in experimental studies can replace each other?