1 / 79

An Atlas of Gene Expression in Mouse Development

An Atlas of Gene Expression in Mouse Development. technology development technology implementation public access. www.mouseatlas.org. Pipeline. Tissues RNAs Tags Transcribed Features. Manual dissection Laser capture microdissection RNA purification. Known coding elements

elga
Download Presentation

An Atlas of Gene Expression in Mouse Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Atlas of Gene Expression in Mouse Development • technology development • technology implementation • public access www.mouseatlas.org

  2. Pipeline Tissues RNAs Tags Transcribed Features • Manual dissection • Laser capture microdissection • RNA purification • Known coding elements • New genes (housekeeping and regulated) • New transcripts • New exons • New regulatory RNAs • longSAGE • longSAGE Lite Bioinformatics • Tag-to-gene mapping

  3. Major technical accomplishments • Established a SAGE library construction pipeline aimed at constructing 150 libraries by March 31, 2005. • Established methods for tissue acquisition and dissection (manual and LCM) that yield high-quality mRNA for SAGE. • Established methods for construction of SAGE libraries from nanograms of total RNA. • Established bioinformatics pipeline for extraction and analysis of 21mer SAGE tags.

  4. SAGE library production queue • 88 samples in queue (57% complete) • 61 libraries constructed • 52 libraries passed QC • 39 libraries complete (26 % complete) 179,000 sequence reads 5.7 million tags 33 tags/read

  5. The majority of tags can be mapped to existing sequence datasets Proportion of tags Tag Frequency

  6. Most transcripts are “hit” by a SAGE tag 4,552,635 tags; 543,545 transcripts Tags mapped: XXXX mgcmouse: 22419/24607 0.911 refmouse: 13871/18212 0.762 refmouseX: 17464/25362 0.689 refmouseGS: 19795/42393 0.467

  7. Detection of 3’ end variants Mouse Atlas SAGE meta-library: 4,552,635 tags, 543,545 tag types 61% of moderately abundant transcripts show multiple tag positions 18,775 transcripts(8,400 genes) >10 52% of highly abundant transcripts show multiple tag position 6,888 transcripts(3,550 genes) >=100 Comparison of all tags to 27,026 transcripts • from refseq, refseqX, refseqGS, and mgc: • Approximately 1.6 variants per locus • 1 variant / locus for 58% of genes • 2.8 variants / locus for 42% of genes

  8. Tag-to-genome mapping Location All tags > 7 tags • Exons 25% 73% • Introns 18% 1.4% • 5kb from a UTR 11.5% 11.5% • Intergenic 15% 4.4% • Minus strand 29.4% 8.9% 62 % of tag “types” map uniquely to the genome The distribution of tags or their annotation varies with the level of expression

  9. Website Usage

  10. Website Usage

  11. Relationship Between Mouse Efforts SAGE Mouse Altas Project (Marra and Hoodless) 150 SAGE Libraries 5 RNAs to be directly compared MPSS Mouse Project (Chris Austin) 94 MPSS Libraries Public Accessibility 244 Digital Libraries Transfer of data (34 libraries to date) CGAP SAGEgenie (Greg Riggins) Other efforts to note: Australia, Czech website

  12. Total co-funding

  13. Supplemental Slides

  14. Most transcripts are “hit” by a SAGE tag 4,552,635 total tags; 543,545 transcripts Total tags mapped to any transcript = XXXX mgcmouse anywhere: 22419/24607 = 0.91108 mgcmouse position 1: 20981/24607 = 0.85264 refmouse anywhere: 13871/18212 = 0.76164 refmouse position 1: 12176/18212 = 0.66857 refmouseX anywhere: 17464/25362 = 0.68859 refmouseX position 1: 14664/25362 = 0.57819 refmouseGS anywhere: 19795/42393 = 0.4669 refmouseGS position 1: 15947/42393 = 0.37617

  15. 7 6 5 4 3 2 1 AAAAAAA Alternate 3’ ends: Multiple tags map to a gene 7 6 5 4 3 2 AAAAAAA 7 6 5 4 AAAAAAA Positions in the transcript are defined by NlaIII sites. Alternate 3’ end formation (alternate splicing) can result in different tags identifying the same transcript.

  16. Contaminants/ artifacts • hnRNA (unspliced mRNA) and genomic DNA not likely to be a major contaminant. 18 % of all tags map to introns versus 1.4% of abundant tags (slide 8). • Partial digestion not likely to be a major artifact. The majority (58%) of transcripts show only a single variant. Of N manually inspected examples that show multiple variants, m looked like the picture in the next slide

  17. Splice variants 7 6 5 4 3 2 1 AAAAAAA 236 tags detected for transcript nm144802 Tag position in the transcript

  18. Rate of tag generation and tag quality Required from May 04 – Mar 05: 1.176 million tags / mo (~36,000 reads)

  19. Process

  20. Frequency of variants detected by SAGE For all transcripts with count >=20 (15915 transcripts) from refseq, refseqX, refseqGS, and mgc Sage tag variation: • Approximately 1.6 variants per locus • 1 variant per locus for 40% of genes • 3.8 variants per locus for 60% of genes • If there exist 30000 genes in the human genome • Predict 50400 “3’ UTR variants” • Total of 80400 variants • This assumes the ratio holds for all gene expression • Sage tag variants are a subset (3’utr) of splice variants • 5’sage may expand the subset

  21. Novel gene discovery via SAGE • 3519 tags occurred only in predicted transcripts • Intron and 3’utr locations indicate alternate splicing • 150K (28%) tag types occurred only in the genome, not in transcripts • 9087(1.7%) with frequency > 10 and 24352(4.5%) with frequency > 3 • Would expect 2% polymorphism, except this is inbred line? • 326K (60%) of tag types were unaccounted for • 3450(0.6%) with frequency > 10, 13378(2.5%) with frequency > 3 • Sequence error, expectation is now known • Polymorphism, expect 2% of tag types, except this is inbred line? • Spliced tags in novel transcripts, up to 6% of tag types to be spliced Many undiscovered transcripts exist, 2% at a moderate to high frequency, 10-15% at low frequency

  22. Novel gene discovery via SAGE Mouse Atlas 28 libraries after clustering tags • 3047 tags occurred only in predicted transcripts • Intron and 3’utr locations indicate alternate splicing • 124K (36%) tag types occurred only in the genome, not in transcripts • 7022(2.0%) with frequency > 10 and 19155(5.5%) with frequency > 3 • Would expect 2% polymorphism, except this is inbred line? • 162K (47%) of tag types were unaccounted for • 641(0.2%) with frequency > 10 and 2384(0.7%) with frequency > 3 • Sequence error should be very low for non-singletons • Polymorphism, expect 2% of tag types, except this is inbred line? • Spliced tags in novel transcripts, up to 6% of tag types to be spliced Many undiscovered transcripts exist, 2% at a moderate to high frequency, 10-15% at low frequency

  23. Detection of coding features • N million tags representing N transcripts and at least N genes. (complexity) • Distribution of transcript abundance (graph) • coverage of refseq, MGC and unigene (complexity and breadth) • N differentially expressed between any two stages at p < 0.001 (regulated) • N not differentially expressed at p < 0.001 (housekeeping) • N candidate new genes identified • Genes that map, genes that don’t map, etc • Quality of the data

  24. Co-funding spent

  25. Library construction rates • Mouse libraries needed (May 04 – Feb 05).…………………….9.8 / mo • Mouse libraries made (Oct 03 – Apr 04, excl. Dec 03)..…………6.2 / mo • Mouse libraries made (Mar 04 – Apr 04)..………………………7 / mo • All libraries made (Mar 04 – Apr 04)…………………………….9 / mo • Most libraries made: (Oct 03, Mar 04)…………………………...10 / mo

  26. Tissue Acquisition Pipeline Libraries Made 52 Libraries in Progress 5 Tissues Waiting for Library Construction 17 Tissues Collected (but not yet delivered) 12 Tissues To Be Collected 64 Number of Tissues Month (June 2004-January 2005)

  27. www.mouseatlas.org cgap.nci.nih.gov

  28. Training / Recruitment

  29. Management

  30. Detection and elimination of contamination

  31. 101 SAGE libraries built

  32. 101 SAGE libraries

  33. Tags sequenced

  34. Library Construction Scale-Up

  35. Rationale and Goals Systematic association of expressed genes with precisely defined tissues sampled throughout development will enhance dramatically the mouse as a tool for developmental biologists and those seeking to understand the genetic basis of disease in murine models. • To construct and sequence 150 SAGE libraries representing a variety of tissues and developmental stages • To place these data in the public domain

  36. Progress Objective 1: Define the normal state for many tissues by determining…the number and identity of genes expressed throughout development. Progress: longSAGE library construction pipeline established! LCM tissue harvesting explored. Tech. D. on small samples well advanced. SAGELite and PCRSAGE libraries constructed. N SAGELite libraries constructed and sequenced by March 31, 2005. Trans-NIH group completed tissue harvesting for 90 adult tissues. Projected MPSS data in public domain before Fall 2004. Tag-to-gene mapping (v.1) complete at Vancouver. Mouse SAGE Genie under development (Hopkins / CGAP; G. Riggins PI). Plan for completion: Continue at current rate with increasing emphasis on small, manually- and LCM-dissected samples. Objective 2: Establish a data structure / curation strategy that will facilitate the ongoing collection of gene expression data…. Progress:www.mouseatlas.org active and www.ncbi.nlm.nih.gov/ncicgap/ will soon host data (N libraries submitted to S. Greenhut and C. Schaeffer). Plan for completion: Essentially complete. Data and annotation from Mouse Atlas and NIH /LYNX will populate databases. Objective 3: Assemble gene expression profiles [to] test hypotheses related to technologies, tumor models and models of abnormal development. Progress: N % complete. List models so far. Philosophy has been to focus on establishing pipeline for wild-type tissues as these nay be most relevant to broader community. Increased focus on models over remainder of project. Plan for completion: Construct, sequence and analyze N libraries representing specific models, including….

  37. Detailed Milestones Year 1 • Establish the project management and communication system. COMPLETE • Launch of project web site in public domain. The launch will include a registration on the website of all of the tissues we intend to include in the Atlas. COMPLETE • Complete dissections of type A tissues (See Table 1). INCOMPLETE • Complete experiments to compare the use of amplified and non-amplified RNA in SAGE library construction. COMPLETE • Complete experiments to compare the use of RNA from tissues isolated by Laser Microdissection and by manual dissection in SAGE library construction. INCOMPLETE • Implement SAGE Bioinformatic processing pipeline. This includes implementation of software to automatically perform quality control testing on the sequencing of the tags, and entry of the tags into SAGEdb for subsequent analysis. COMPLETE • Expand SAGEdb to accommodate dissection procedures and digital images of mouse tissues used to generate the SAGE libraries. WEBSITE • Construct and sequence 40 (30) SAGE libraries and enter them in the database. COMPLETE Year 2 • Complete dissections of type B and C tissues (See Table 1). INCOMPLETE • Construct and sequence 80 (60) SAGE libraries and enter them in the database. IN PROGRESS • Web enabled data mining tool available for SAGE library comparisons. IN PROGRESS • Use of bioinformatics to identify differentially expressed genes from SAGE libraries for further analysis and to assess the quality of the libraries generated. IN PROGRESS Year 3 • Complete dissections of type D tissues and mouse models (See Table 1). INCOMPLETE • Construct and sequence 80 (60) SAGE libraries and enter them in the database. INCOMPLETE • Complete quantitative RT-PCR (QPCR) and in situ hybridization analysis for quality control and quality assurance. IN PROGRESS • Complete SAGE library construction, sequencing and analysis on mouse models to test utility of the database. IN PROGRESS • Demonstrate of the potential uses of the Atlas through SAGE analysis of specific mouse models. IN PROGRESS • Generate a spin-off project based on SAGE-based discoveries from mouse models (cancer models, early embryogenesis, or ‘fierce’ mice, see below). COMPLETE • Identify candidate genes not previously reported in mouse databases. IN PROGRESS • Target corporate partnership to build microarrays based on new candidate genes. INCOMPLETE • Present research discoveries based on the Atlas at scientific conference (e.g. Gordon Conference). IN PROGRESS • Publish dataset in peer-reviewed journals. IN PROGRESS Established collaboration with NIH group to compare MPSS and SAGE and coordinate effort. LYNX efforts focused on adults; BC efforts focused on earlier developmental stages.

  38. Mouse Atlas: SAGE Library Construction Genome Sciences Centre BC Cancer Agency 11th May 2004

  39. S A G E L I B R A R Y C O N S T R U C T I O N

  40. Agilent Bioanalyzer – RNA picochip, total RNA, 115 pg/uL synthetic 25nt marker

  41. PCR Optimization on 12% Polyacrylamide Gel Load 5uL of Ladder Load 5uL of sample 200 bp 175 bp 150 bp 131 bp Ditag 125 bp 100 bp 75 bp My Network Places/ mapper.ro on Xena/ GeneExpLab/ Typhoon Data/ LongSAGE folder/ Library Folder/ gel name_date 50 bp Brew only (Brew control) 1/10 dil LS Control template (+’ve control) 1/10 dil LS Control template (+’ve control) 1/10 dil LS Control template (+’ve control) 1/20 dil No Ligase (-’ve control) Brew only (Brew control) Brew only (Brew control) Brew only (Brew control) 25 bp 1/20 dil No Ligase (-’ve control) 1/20 dil No Ligase (-’ve control) 1/20 dil No Ligase (-’ve control) 1/20 dil Ligation 1/40 dil Ligation 1/80 dil Ligation 1/20 dil Ligation 1/40 dil Ligation 1/80 dil Ligation 1/20 dil Ligation 1/40 dil Ligation 1/80 dil Ligation 25 bp Ladder (20 ng/uL) 25 bp Ladder (20 ng/uL) 23 cycles 25 cycles 27 cycles 35 cycles 25 bp Ladder (20 ng/uL) 25 bp Ladder (20 ng/uL)

  42. 131bp Ditag on 12% Polyacrylamide Gel Load 5uL of Ladder Load 6 – 8 uL of sample per well 200 bp 175 bp 150 bp My Network Places/ mapper.ro on Xena/ GeneExpLab/ Typhoon Data/ LongSAGE folder/ Library Folder/ gel name_date 131 bp Ditag 125 bp 100 bp 75 bp 25bpLadder (20ng/uL)

  43. 36bp Ditag on 15% Polyacrylamide Gel Load 4 ul of sample per well Load 5uL Ladder Empty lane Empty lane 200 bp 175 bp 150 bp 131 bp Uncut Ditag 125 bp 100 bp 84 bp and 87 bp Partially cut Ditag 75 bp My Network Places/ mapper.ro on Xena/ GeneExpLab/ Typhoon Data/ LongSAGE folder/ Library Folder/ gel name_date 50 bp 44bp and 47bp Adaptor sequence 36 bp Ditag 25 bp 25bp ladder 25bp Ladder (20 ng/uL) (20 ng/uL)

  44. Concatemer on 8% Polyacrylamide Gel 100 bp Ladder (10 ng/uL) Load 10 uL Load all 10 uL of concatemer into 1 well 100 bp Ladder (20 ng/uL) Load 5 uL 2072 bp 1500 bp 1000 bp Large size fraction 900 bp 1000 bp 900 bp 800 bp Medium size fraction 800 bp 700 bp 700 bp 600 bp 600 bp Small size fraction My Network Places/ mapper.ro on Xena/ GeneExpLab/ Typhoon Data/ LongSAGE folder/ Library Folder/ gel name_date 500 bp 500 bp 400 bp 400 bp 300 bp 300 bp 200 bp 200 bp 100 bp 100 bp

  45. Colony PCR on 1.5% Agarose Gel 1 Kb+ Ladder (20 ng/uL) Load 1 uL Load 1.5 uL of Sample per well My Network Places/ mapper.ro on Xena/ GeneExpLab/ Typhoon Data/ LongSAGE folder/ Library Folder/ gel name_date No DNA and No Ligase -’ve controls Small size fraction Medium size fraction Large size fraction

  46. Library construction – future throughput & staffing

  47. Bottom line : one 3’ most tag per transcript

More Related