1 / 27

Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development. Yvonne Li Paper presentation for MEDG505 Jan 27, 2005. PLoS Biol. 2005 Jan;3(1):e7. Epub 2004 Nov 11. Outline. Motivation. Method and Results. Discussion. Motivation.

evan
Download Presentation

Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development Yvonne Li Paper presentation for MEDG505 Jan 27, 2005 PLoS Biol. 2005 Jan;3(1):e7. Epub 2004 Nov 11.

  2. Outline Motivation Method and Results Discussion

  3. Motivation • Gene Regulatory Networks for development have been described in invertebrates but not characterized for vertebrates • Studies have shown: • a number of developmental genes are regulated by highly conserved enhancer regions at distances of hundreds of kb • ultra-conserved elements are more frequent than expected • there is a significant association between these highly conserved elements and DNA binding proteins • Goal: look for all such elements in the entire human genome and see how they relate to development.

  4. Method Computationally identify Computationally analyze Experimentally validate

  5. Sequence Data Identifying • CNE :Highly Conserved Noncoding Elements • Which 2 species to use for whole-genome alignment?

  6. Sequence Data Identifying • Which 2 species to use for whole-genome alignment? • Human and Fugu • Fugu has 1/8 genome size of human but similar gene repertoire • Fugu’s developmental blueprint is very similar to Human • Two ways to detect CNEs • Whole-genome alignment • Regional alignments

  7. S T A T S 1373 core set of elements Length: ave 199bp max 736 bp Identity: ave 84% max 98% 1365 conserved in mouse 1316 conserved in rat 1310 conserved in chicken 1093 conserved in zebrafish Obtaining CNEs Identifying • Start with Fugu genome assembly • MegaBLAST against Ensembl human genome v18.34.1 • Remove alignments < 100bp in length • Masked coding and non-coding RNA content • Remove telomere-like sequences and transposons

  8. CNE Distribution Analyzing • CNEs in human genome are found on all chromosomes except 21 and Y • Distribution of CNEs is highly clustered • Clustered CNEs by genomic location • 165 clusters • The 20 largest clusters have ≥ 20 CNEs

  9. CNE associated genes Analyzing • Find most statistically over-represented GO terms • For each CNE, extract closest gene from Ensembl • 12 of the 13 terms relate to transcriptional regulation and development • How many clusters situated near such transdev genes? • Over 93% of clusters have transdev gene within 500kb of its CNEs. 15% have 2 or more. • CNEs generally located large distances from nearest gene • Average distance between CNE and 5’ end of closest human gene is 182kb, with 93 CNEs > 500kb, and 12 CNEs > 1Mb. • Transdev genes are located in regions of low gene density • Average number of genes within 500 kb upstream or downstream is 16 for all human genes and 6 for transdev genes

  10. Obtaining rCNEs Identifying • Use MLAGAN (Localized multiple alignment) to identify additional conserved sequences around specific genes • MLAGAN more sensitive than whole-genome alignment • Species: Human, Fugu, Mouse, Rat • Algorithm itself is more sensitive • Require only 40bp window with 60% identity • Chose 4 cluster regions containing diff types of developmental genes: • SOX21, PAX6, HLXB9, SHH • Sometimes, the CNEs are more conserved than the gene’s coding exons!

  11. Sox21 MLAGAN

  12. Vertebrates vs Invertebrates • Are the CNEs also found in invertebrates? • Use all CNEs and rCNEs • Search whole genome sequence of • Ciona intestinalis • Drosophila melanogaster • Caenorhabditis elegans • Anopheles gambiae • No significant matches • (however, the genes have clear homologs) • 43 CNEs show significant similarity to at least one other CNE (their genes have clear paralogous relationships)

  13. Method Computationally identify CNEs Computationally analyze CNEs Experimentally validate a few CNEs

  14. Experimental Validation • Coinject CNEs with green fluorescent protein (GFP) reporter, in zebrafish embryos • Idea: • CNEs contain something that affects the transcription of a transdev gene • The transdev gene affects development • Examine the ability of CNEs to up-regulate GFP reporter expression

  15. Experimental Validation • Chose 25 regions for GFP assay • 10 CNEs, 15 rCNEs • Look for GFP expression in live embryos • Average of 200 embryos screened per control • No upregulation • Average of 188 embryos screened per element • GFP expression in all but 2 elements; varied from 4% to 44%

  16. SOX21 associated elements Known • SRY-related box gene • Acts as a transcriptional repressor during early development • Expressed in a complex manner in CNS, and in nasal epithelium, lens and retina of eye, inner ear

  17. PAX6 associated elements Known • Paired-box containing transcription factor, known to be influenced by cis-acting elements in upstream, intronic and downstream positions • Expressed in developing eye, forebrain, hindbrain, spinal cord

  18. HLXB9 associated elements Known • Homeobox gene associated with autosomal dominant effects • Zebrafish ortholog is expressed in notochord, hypochord, tail mesoderm, and tailbud

  19. SHH associated elements Known • A signaling molecule • Zebrafish ortholog is expressed mainly in midline structures like floorplate and notochord, but also in branchial arches, pectoral fin buds, retina

  20. CNE-gene misassociated, especially in gene-rich regions Can kind of tell from results of assays CNEs missed due to stringent whole-genome analysis Down regulation of expression will not be detected Assayed elements out of context and individually Each element had cases of unexpected expression Tissues from few cells are underrepresented Late developing tissues or cell types after 24 h will be missed completely Limitations

  21. Summary • Identified a set of 1373 vertebrate CNEs • Experimentally showed CNE-transdev gene association • CNEs found in clusters, in front of transdev genes • CNEs act at large distances from coding sequence • The relative order and positions of CNEs are conserved • No vertebrate CNEs were found in invertebrates, even though the genes had clear homologs • Many of these results are paralleled by a similar paper (Sandelin et al. 2004) • >50bp, >95% Human/Mouse identity • 3583 Human/Mouse/Pufferfish UCRs; ave length 125 bp

  22. Discussion • Almost all CNEs are associated with developmental regulators • Do most transdev genes have CNEs associated? • CNEs act at large distances from gene • They could be enhancers or silencers • The relative positioning and order of CNEs are completely conserved • Do they play a role in structuring the genomic architecture around transdev genes?

  23. Discussion • No vertebrate CNEs are found in invertebrates • Are there CNEs in invertebrates? • But PAX6 in Drosophila has been shown to have an highly effective LE9 enhancer, that is also well conserved in vertebrates (The Interactive Fly) • Why is it not found in this analysis? • Only 52 bp in length! (but the MLAGAN should have found it ..) • So, maybe invertebrate enhancers/CNEs are shorter • Should maybe look for shorter CNEs in vertebrates

  24. Discussion • Missing whole genome CNEs due to stringency of parameters. • Try discontinuous MegaBLAST which does not require exact word match of 20. • Only 109 of 256 of non-coding ultraconserved regions from Berejano et al. are identified.

  25. Discussion • What is in the CNE? • Modules of transcription factor binding sites? • Hard to account for the high level conservation. • Perform assays on portions of the CNEs. • Use computational methods. • Regulatory RNAs? (i.e. microRNAs) • Lack of EST evidence. • Use regulatory RNA gene finders? • Something else entirely? • One thing is in agreement: • More functional studies are needed.

  26. Discussion • Do CNEs work together? • How to robustly test combinations of elements? • Mutations in CNEs can cause human disease • Studies are showing that mutations in CNEs cause disorders. CNEs at very distal locations can still effect the transcription • May be candidates for genetic screens seeking sequence variation associated with disease • Check it out with dbSNP!

  27. References & Acknowledgements • Thanks to Misha Bilenky for lots of fun discussion  • Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri T, Smith SF, North P, Callaway H, Kelly K, Walter K, Abnizova I, Gilks W, Edwards YJ, Cooke JE, Elgar G.Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 2005 Jan;3(1):e7. Epub 2004 Nov 11. • Elgar, G.Identification and analysis of cis-regulatory elements in development using comparative genomics with the pufferfish, Fugu rubripes. Semin Cell Dev Biol. 2004 Dec;15(6):715-9. • Venkatesh B, Yap WH.Comparative genomics using fugu: a tool for the identification of conserved vertebrate cis-regulatory elements. Bioessays. 2005 Jan;27(1):100-7. • Sandelin A, Bailey P, Bruce S, Engstrom PG, Klos JM, Wasserman WW, Ericson J, Lenhard B. Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics. 2004 Dec 21;5(1):99. • The interactive fly. http://www.sdbonline.org/fly/aimain/1aahome.htm

More Related