1 / 23

Transcription factor binding sites and gene regulatory network

Transcription factor binding sites and gene regulatory network. Victor Jin Department of Biomedical Informatics The Ohio State University. Transcription in higher eukaryotes. Gene Expression Chromatin structure Initiation of transcription Processing of the transcript

Download Presentation

Transcription factor binding sites and gene regulatory network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transcription factor binding sites and gene regulatory network Victor Jin Department of Biomedical Informatics The Ohio State University

  2. Transcription in higher eukaryotes • Gene Expression • Chromatin structure • Initiation of transcription • Processing of the transcript • Transport to the cytoplasm • mRNA translation • mRNA stability • Protein activity stability

  3. Transcriptional Regulation Nuclear membrane

  4. Transcriptional Regulation Nuclear membrane Binding site/motifCCG__CCG Genome-wide mRNA transcript data (e.g. microarrays)

  5. Transcriptional Regulation Learning problems: • Understand which regulators control which target genes Nuclear membrane Binding site/motifCCG__CCG • Discover motifs representing regulatory elements

  6. Some common approaches • Cluster-first motif discovery • Cluster genes by expression profile, annotation, … to find potentially coregulated genes • Find overrepresented motifs in promoter sequences of similar genes (algorithms: MEME, Consensus, Gibbs sampler, AlignACE, …) (Spellman et al. 1998)

  7. Training data – Features regulator expression promoter sequence label feature vector

  8. Pos 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 A 18 8 5 4 1 29 7 7 7 0 1 39 1 1 6 C 8 3 3 9 33 4 21 15 14 0 0 1 43 39 18 G 13 31 34 9 8 10 11 15 19 4 44 3 0 1 6 T 7 4 4 24 4 3 7 9 6 42 1 3 2 5 16 Con N G G T C A N N N T G A C C N What is PWM? • Transcription factor binding sites (TFBSs) are usually slightly variable in their sequences. • A positional weight matrix (PWM) specifies the probability that you will see a given base at each index position of the motif.

  9. PWM for ERE Position frequency matrix (PFM) (also known as raw count matrix) • acggcagggTGACCc • aGGGCAtcgTGACCc • cGGTCGccaGGACCt • tGGTCAggcTGGTCt • aGGTGGcccTGACCc • cTGTCCctcTGACCc • aGGCTAcgaTGACGt • . • . • . • cagggagtgTGACCc • gagcatgggTGACCa • aGGTCAtaacgattt • gGAACAgttTGACCc • cGGTGAcctTGACCc • gGGGCAaagTGACTg Given N sequence fragments of fixed length, one can assemble a position frequency matrix (number of times a particular nucleotide appears at a given position). A normalized PFM, in which each column adds up to a total of one, is a matrix of probabilities for observing each nucleotide at each position. Position weight matrix (PWM) (also known as position-specific scoring matrix) PFM should be converted to log-scale for efficient computational analysis. To eliminate null values before log-conversion, and to correct for small samples of binding sites, a sampling correction, known as pseudocounts, is added to each cell of the PFM.

  10. Position Weight Matrix for ERE Converting a PFM into a PWM For each matrix element do: – raw count (PFM matrix element) of nucleotide b in column i N – number of sequences used to create PFM (= column sum) - pseudocounts (correction for small sample size) p(b) - background frequency of nucleotide b

  11. Scoring putative EREs by scanning the promoter with PWM G G G T C A G C A T G G C C A Absolute score of the site =11.57

  12. Yeast ESR: Biological Validation Universal stress repressor motif STRE element

  13. Previous work: “Structure learning” • Graphical models (and other methods) • Learn structure of “regulatory network”, “regulatory modules”, etc. • Fitinterpretable model totraining data • Model small number of genes or clustersof genes • Many computational and statistical challenges; often used for qualitative hypotheses rather than prediction (Pe’er et al. 2001) (Segal et al, 2003, 2004)

  14. Signaling networks in a cell

  15. Network inference P Mp P TF P MTF Mp M • Regulator-motif associations in nodes can have different meanings: • Need other data to confirm binding relationship between regulator and target (e.g. ChIP-chip) • Still, can determine statistically significant regulator-target relationships from regulation program Direct binding Indirect effect Co-occurrence

  16. Example: oxygen sensing and regulatory network

  17. Binding data for regulatory networks • ChIP-chip: genome-wide protein-DNA binding data, i.e. what promoters are bound by TF? • Investigate regulatory network model: use ChIP-chip data in place of motifs (no motif discovery) • Features: (regulator, TF-occupancy) pairs P1 P2 TF

  18. Inferring regulatory networks from the combination of expression data and binding data

  19. RUVBL1 GTF2I ZNF500 TTF2 RFC1 RXRA MKL2 ZKSCAN1 RAB18 HSF2 ASCC3 BHLHB2 MSX2 PNN HIF1A ZNF38 BAZ1B HEY2 ER STRAP CEBP DNMT1 XBP1 NRIP1 TLE3 LASS2 ZNF394 VPS72 ZNF239 THRAP1 FOXP4 HDAC1 TXNDC ZBTB41 BRIP1 FOS TBX2 TXNIP MYC PAWR ELF3 IVNS1ABP CHAF1B PURB DDX20 C140RF43 BATF CSDE1 SP3 HES1 ADAR CUTL1 An extended ER regulatory network in MCF7 cells CCNL1 BRF1

  20. Signaling molecules -- Networks TF SM mRNA Glc7 phosphatase complex Gac1 Hsf1 Sds22 Gip1 • Find all SMs that associate as regulators with a particular TF’s ChIP occupancy in ADT features • e.g. • Hypothesis: Glc7 phosphatase complex interacts with Hsf1 in regulation of Hsf1 targets • (Interaction supported in literature)

  21. http://motif.bmi.ohio-state.edu/ChIPMotifs/ • FASTA file • Contact Info • Control data (optional) Input Data • Weeder • MaMf • MEME Ab initio Motif Discovery Programs • Bootstrap re-sampling • Fisher test Statistical Methods STAMP Matching • SeqLog • PWM • P-value • Known or novel motifs Results

  22. http://motif.bmi-ohio-state.edu/HRTBLDb

  23. Software Demo • W-ChIPMotifs • HRTargetDB

More Related