1 / 30

BB30055: Genes and genomes

BB30055: Genes and genomes. Major insights from the HGP. SNPS occur at a mean rate of 1.23%. What makes us human?. Nature 437, 50-51 (1 September 2005). Major insights from the HGP. Gene size, content and distribution Proteome content SNP identification Distribution of GC content

tkellogg
Download Presentation

BB30055: Genes and genomes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BB30055: Genes and genomes Major insights from the HGP

  2. SNPS occur at a mean rate of 1.23% What makes us human? Nature 437, 50-51 (1 September 2005)

  3. Major insights from the HGP • Gene size, content and distribution • Proteome content • SNP identification • Distribution of GC content • CpG islands • Recombination rates • Repeat content Nature (2001) 15th Feb Vol 409 special issue; pgs 814 & 875-914.

  4. 1) Gene size

  5. Gene content…. More genes: Twice as many as drosophila / C.elegans Uneven gene distribution: Gene-rich and gene-poor regions More paralogs: some gene families have extended the number of paralogs e.g. olfactory gene family has 1000 genes More alternative transcripts: Increased RNA splice variants produced thereby expanding the primary proteins by 5 fold (e.g. neurexin genes)

  6. Gene distribution Genes generally dispersed (~1 gene per 100kb) Class III complex at HLA 6p21.3 Overlapping genes (transcribed from 2 DNA strands) - Rare Genes- within genes E.g. NF1 gene HMG3 Fig 9.8

  7. Uneven gene distribution Gene-rich E.g. MHC on chromosome 6 has 60 genes with a GC content of 54% Gene-poor regions 82 gene deserts identified ? Large or unidentified genes What is the functional significance of these variations?

  8. 2) Proteome content proteome more complex than invertebrates Protein Domains (sections with identifiable shape/function) Domain arrangements in humans largest total number of domains is 130 largest number of domain types per protein is 9 Mostly identical arrangement of domains A A B B B C C C C C Protein X

  9. Proteome more complex than invertebrates…… • no huge difference in domain number in humans • BUT, frequency of domain sharing very high in human proteins (structural proteins and proteins involved in signal transduction and immune function) • However, only 3 cases where a combination of 3 domain types shared by human & yeast proteins. • e.g carbomyl-phosphate synthase (involved in the first 3 steps of de novo pyrimidine biosynthesis) has 7 domain types, which occurs once in human and yeast but twice in drosophila

  10. 3) SNPs (single nucleotide polymorphisms) • Sites that result from point mutations in individual base pairs • More than 1.4million SNPs identified (~ 1 in every 1.9kb length on average) • ~60,000 SNPs lie within exons and untranslated regions (85% of exons lie within 5kb of a SNP) • May or may not affect the ORF (synonymous or non synonymous) • Most SNPs may be regulatory Densities vary over regions and chromosomes e.g. HLA region has a high SNP density, reflecting maintenance of diverse haplotypes over many MYears Nature (2001) 15th Feb Vol 409 special issue; pgs 821-823 & 928

  11. Haplotype( haploid genotype) • Haplotype is a set of single nucleotide polymorphisms (SNPs) on a single chromatid that are statistically associated. • Haplotypes are generally shared between populations but their frequency can vary International HapMap Project (www.hapmap.org) – identifying common haplotypes in four populations from different parts of the world. - identifying "tag" SNPs with unique haplotype identities

  12. How does one distinguish sequence errors from polymorphisms? sequence errors Each piece of genome sequenced at least 10 times to reduce error rate (0.01%) Polymorphisms Sequence variation between individuals (0.1%) To be defined as a polymorphism, the altered sequence must be present in a significant population Rate of polymorphisms in diploid human genome is about 1 in 500 bp Nature (2001) 15th Feb Vol 409 special issue; pgs 821-823 & 928

  13. SNPs and disease

  14. SNPs……and risk of disease N(291)S

  15. SNPs……and pharmacogenomics

  16. 4) Distribution of GC content Genome wide average of 41% Huge regional variations exist E.g.distal 48Mb of chromosome 1p-47% but chromosome 13 has only 36% Confirms cytogenetic staining with G-bands (Giemsa) dark G-bands – low GC content (37%) light G-bands – high GC content (45%) Nature (2001) 15th Feb Vol 409 special issue; pg 876-877

  17. C T 5) CpG islands CpG TpG Methyl CpG Significance of CpG islands • Non-methylated CpG islands associated with the 5’ ends of genes • Usually overlap the promoter region • Aberrant methylation of CpG islands linked to pathologies like cancer or epigenetic diseases like Rhett’s syndrome Deamination methylated at C CpG islands show no methylation http://www.sanger.ac.uk/HGP/cgi.shtml

  18. Inheritance of CpG methylation

  19. CpG islands Greatly under-represented in human genome • ~28,890 in number (5 times less than expected) ~ 56% of human genes and 47% of the mouse genes have CpG islands Variable density e.g. Y – 2.9/Mb but 16,17 & 22 have 19-22/Mb Average is 10.5/Mb Nature (2001) 15th Feb Vol 409 special issue; pg 877-888

  20. 6) Recombination rates 2 main observations • Recombination rate increases with decreasing arm length • Recombination rate suppressed near the centromeres and increases towards the distal 20-35Mb

  21. 7) Repeat content • Age distribution • Comparison with other genomes • Variation in distribution of repeats • Distribution by GC content • Y chromosome Nature (2001) 409: pp 881-891

  22. a) Age distribution overall decline in interspersed repeat activity in hominid lineage in the past 35-40MYr compared to mouse genome, which shows a younger and more dynamic genome

  23. Repeat content……. a) Age distribution • Most interspersed repeats predate eutherian radiation (confirms the slow rate of clearance of nonfunctional sequence from vertebrate genomes) • LINEs and SINEs have extremely long lives • 2 major peaks of transposon activity • No DNA transposition in the past 50MYr • LTR retroposons teetering on the brink of extinction

  24. b) Comparison with other genomes • Higher density of transposable elements in euchromatic portion of genome • Higher abundance of ancient transposons • 60% of IR made up of LINE1 and Alu repeats • whereas DNA transposons represent only 6%

  25. c) Variation in distribution of repeats Some regions show either High repeat density e.g. chromosome Xp11 – a 525kb region shows 89% repeat density Low repeat density e.g. HOX homeobox gene cluster (<2% repeats) (indicative of regulatory elements which have low tolerance for insertions)

  26. d) Distribution by GC content High GC – gene rich ; High AT – gene poor LINEsabundant in AT-rich regions SINEs lower in AT-rich regions Alu repeats in particular retained in actively transcribed GC rich regions E.g. chromosme 19 has 5% Alus compared to Y chromosome

  27. e) The Y chromosome ! Unusually young genome (high tolerance to gaining insertions) Mutation rate is 2.1X higher in male germline

  28. Working draft published – Feb 2001 • Finished sequence – April 2003 • Annotation of genes going on (refer: International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 21 October 2004 (doi: 10.1038/nature03001)

  29. References Chapter 9 pp 265-268 HMG 3 by Strachan and Read Chapter 10: pp 339-348 Genetics from genes to genomes by Hartwell et al (2/e) Nature (2001) 409: pp 879-891 Nature (2005) for Chimp genome

  30. Epigenetic disease – Rett Syndrome • Characterised by neurodevelopmental problems after birth • mutations in a gene on the X chromosome, MECP2 (methyl CpG-binding protein 2), whose protein normally binds to methylated CpG and represses gene expression • RS symptoms associated with the failure of mutated MECP2 to regulate transcription of a specific gene, DLX5, one allele of which is normally imprinted. Without the MeCP2 protein, production of the Dlx5 protein is increased, which influence production of the neurotransmitter GABA in the brain DLX5 DLX5

More Related