1 / 96

Single Nucleotide Polymorphisms

Single Nucleotide Polymorphisms. Arthur M. Lesk Bologna Winter School 2011. What are SNPs and why are they important?. SNP = Single nucleotide polymorphism, an isolated change in a single nucleotide SNPs are one type of mutation Some have obvious functional consequences

orenda
Download Presentation

Single Nucleotide Polymorphisms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Single Nucleotide Polymorphisms Arthur M. Lesk Bologna Winter School 2011

  2. What are SNPs and why are they important? • SNP = Single nucleotide polymorphism, an isolated change in a single nucleotide • SNPs are one type of mutation • Some have obvious functional consequences • Sickle-cell haemoglobin: gag→gtg (β6 Gln→Val) • First “molecular disease” sickle-cell anaemia • Some are ‘silent’ • Some are in non-coding regions • affect splice sites? • affect regulatory sites? • some have no known phenotypic effect

  3. What is a SNP? • The genomes of individuals in a population contain a particular base at some position most of the time. • That is, there is a “normal” sequence • A SNP is a deviation from the normal sequence. • Many people require that a variation occur in at least 1% of the population, to be considered a SNP • But: what population? What if two distinct populations have a consistent polymorphism?

  4. SNPs in human genomes • SNPs are about 90% of all inter-human variation • Occur on the average once in every 300 bases • 2/3 of SNPs are C→T changes (perhaps because C can easily deaminate) → cytosine uracil

  5. SNP density varies across human genome • Some high-density patches • Some ‘deserts’ • SNPs in coding regions ~1/3 as many as in non-coding regions • SNP density correlated with recombination rate (which causes which??) • AT microsatellites: long (AT)n repeat tracts tend to appear in regions of low SNP density

  6. Figure 14 SNP density in each 100-kbp interval as determined with Celera-PFP SNPs. J C Venter et al. Science 2001;291:1304-1351 Published by AAAS

  7. What is normal? • Obviously we all differ genomically • Swedes and Chinese have obviously different phenotypes • Most Swedes and Chinese are healthy indviduals • Therefore genetic differences do not necessarily cause disease • Pointless to check for differences from a single ‘reference sequence’ • Of course, many genetic differences not just SNPs

  8. Variation in human and other species • Any two humans ~99.5% identical in sequence • Chimpanzees, gorillas: twice as variable, despite much smaller population size • Implies prehistoric bottleneck in human population, recent common origin • Most SNPs (> 5%) shared among human populations from around the world • Most populations (e.g. British) contain 85-90% of all known variation

  9. Variation in human and other species • Some variation is population-specific • In some cases, there is local selective pressure • For example, adult lactose tolerance, malaria resistance • African populations have greatest genetic diversity • Supports ‘Out of Africa’ theory of human origin and migration

  10. Identification of geographical origin, phenotype • A criminal leaves a blood sample at a crime scene • How much can we tell about him or her? • Not perfectly, but: • Ethnic group • Eye and hair colour (hair colour easier to change) • Family name?

  11. Types of SNPs • Transitions: • purine↔ purine • pyrimidine↔ pyrimidine (cytosine→uracil) • Transversions: • purine ↔pyrimidine • Transitions are more common than transversions

  12. Prevalence of SNPs in human genomes • approximately 1 in 300 bp (0.001%) • compare difference between human / chimpanzee genomes: • 4% different (not all SNPs!) 

  13. ‘Life cycle’ of a SNP • Generation of a mutation • Initial survival, against ‘sampling loss’ • Increase in frequency – survival until become homozygous in some individuals; • chance of loss reduced (helped by bottlenecks, founder effects – population size dependent) • Fixation

  14. Initial survival of a SNP • Suppose a person is heterozygous for a novel, selectively-neutral mutation. • Suppose the person has 2 children that survive to reproductive age. The probability of loss of the mutation is 25%. • If each descendant has 2 children that survive to reproductive age, probability of loss in 200 years = 94%

  15. Where do SNPs occur in the human genome? • Distributed throughout the genome • 50% in non-coding regions • NOT the same asnon-functional!!! • 25% missense mutations (amino acid substitution) • 25% silent (amino acid unchanged) • silent = no change in encoded amino-acid sequence • NOT the same as no phenotypic effect!!! • would be better to call them synonomous SNPs rather than silent SNPs

  16. SNPs in non-human genomes • Of course other species have SNPs • Here we will focus on human SNPs because of relevance to human disease • However, SNPs in pathogens are sometimes associated with antibiotic resistance, and therefore related to human disease • SNPs in some plants give clues to domestication

  17. Organised efforts to collect SNPs • The HapMap is a catalogue of common human genetic variants • HapMap Project = international collaboration among Japan, the United Kingdom, Canada, China, Nigeria, and the United States • NOT Europe • Carry out measurements, provide database • Other projects collect SNPs in other species

  18. HapMap project • International consortium: International HapMap Project • http://hapmap.ncbi.nlm.nih.gov/ • Catalogue of human genetic variants : • What sites? • How distributed – frequency in different populations • Raw material for linking genomics with disease

  19. Origin of samples • Total of 270 people. • The Yoruba people of Ibadan, Nigeria • Japan (Tokyo) • China (Beijing) • U.S. residents with Northern and Western European ancestry

  20. What is a haplotype? • Often, a set of SNPs appear nearby on the same chromosome • In absence of recombination, they will be inherited in blocks • Pattern of SNPs in a block is called a haplotype • A block may contain many SNPs, but only a few are needed to identify a haplotype • These signature SNPs within a haplotype block are called `tag SNPs’

  21. http://www.riken.go.jp/engn/r-world/info/release/news/2003/nov/image/frol_06.gifhttp://www.riken.go.jp/engn/r-world/info/release/news/2003/nov/image/frol_06.gif

  22. http://img.medscape.com/fullsize/migrated/553/400/ncpcard553400.fig1.gifhttp://img.medscape.com/fullsize/migrated/553/400/ncpcard553400.fig1.gif

  23. Guide to SNP databases • SNPlinks: http://www.snpforid.org/snpdata.html • NCBI dbSNPhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=snp • The SNP Consortium http://snp.cshl.org/ • HapMaphttp://www.hapmap.org/ • Applied Biosystemshttp://myscience.appliedbiosystems.com/cdsEntry Assays-on-Demand /Form/assay_search_basic.jsp • Ensemblhttp://www.ensembl.org/Homo_sapiens/ • HGVBasehttp://hgvbase.cgb.ki.se/ • SeattleSNPshttps://gvs.gs.washington.edu/GVS/

  24. dbSNP database at NCBI • non-redundant dataset • nomenclature: rs number • rs = reference SNP.

  25. General human mutations • Human Gene Mutation Database http://www.hgmd.cf.ac.uk • over 100000 mutations, in 3700 genes • 6.2% of total ~23000 genes • about 10000 new mutations found per year • OMIM (Online Mendelian Inheritance in Man) • database of mutations associated with human disease • OMIA (Online Mendelian Inheritance in Animal)

  26. Databases with important related information • Online Mendelian Inheritance in Man (OMIM) [NCBI] • Comprehensive compendium of human genes and associated phenotypes • Not limited to SNPs • SNPs3D http://www.snps3d.org/ • SNPs3D assigns molecular functional effects to non-synonymous SNPs based on structure and sequence analysis. • SNPperhttp://snpper.chip.org/ • Retrieve SNPs by position or gene association

  27. Quality of sequence information is important • SNPs appear in human genome at approximately 1 in 300 bases • Obviously error rate in resequencing must be substantially lower than this if SNP data are to be meaningful • Measure of DNA sequencing quality: PHRED

  28. PHRED – measure of sequence quality • Phred scores accepted to characterize the quality of DNA sequences • Originally Phred was a program, that determined accurate quality scores indicating error probabilities. • Accepted as general standard • Phred quality score Q. Let P = probability of base error Q = -10 log10 P

  29. A method that gave an averaged phred score Q = 30 would give approximately as many errors as there are SNPs!

  30. What can SNPs tell us? • Causes of disease -- dysfunctional protein • Correlation with disease prognosis, success of particular treatment • Useful genetic markers, to locate some gene of phenotypic interest; for instance, a gene correlated with a disease • Characterise individuals • Characterise populations (SNP distribution) • Applications in anthropology -- tracing of migrations, human evolution

  31. Use of SNPs as genetic markers • Before 1980, genetic maps were constructed by measuring recombination frequencies between genes giving measurable phenotypic traits • This goes back at least to Sturtevandt and Morgan, if not to Mendel • At that time, phenotypes were the only visible aspect of the genome

  32. Use of SNPs as genetic markers • In 1980, Botstein, Davis, Skolnick & White proposed using polymorphic DNA markers for genetic mapping, even if they had no known phenotypic effect • Example: (then) restriction sites • SNPs →restriction fragment length polymorphisms (RFLPs) • Did linkage mapping with restriction sites • Now we can use SNPs

  33. Traits depending on multiple loci • Use of SNPs to identify traits, including but not limited to diseases, that depend on multiple loci • Single genes for diseases showing simple Mendelian inheritance (for instance, cystic fibrosis) can be isolated • Diseases that depend on interaction with multiple loci can be studied with enough SNP linkage information

  34. SNPs tell us about human history • Development of ability to digest lactose past infancy correlated with domestication of cattle, increased (non-fermented) dairy products in human diet • Source of calcium and calories • Many Asian populations retain adult lactose intolerance • Where do they get calcium? “The soybean is the cow of Asia.”

  35. Ability to digest lactose in adulthood • Digestion of lactose depends on enzyme lactase-phlorizinhydrolase, which catalyzes hydrolysis of lactose → glucose + galactose

  36. Ability to digest lactose in adulthood • In many people, the ability to digest lactose is a juvenile characteristic • Expression declines after age 2 • varies among individuals • Consistent with lifestyle involving breast feeding until this age, followed by weaning followed by diet not including (non-fermented) milk and other dairy products • To form yoghurt, bacteria cleave lactose

  37. Evolution of adult lactase expression • Domestication of cattle, with concomitant rise of milk in the diet, led to selective pressure for lactose tolerance • Mutation arose among cattle-raising people: • the Funnel Beaker culture • north-central Europe ~5,000-6,000 years ago • Most common mutations in Europeans: SNPs • C/T-13910 • G/A-22018 • Not surprisingly, in control regions for lactase gene

  38. Prevalance of lactose-tolerance SNP Group Study Exchange http://gseorlando.files.wordpress.com/2010/09/j.jpg

  39. Multiple development of lactose tolerance • Development of lactose tolerance apparently appeared four times, independently • Europe: C/T-13910 and G/A-22018 • Pastoral areas of Africa – three independent mutations: • G/C-14010 East Africa • T/G-13915 North Sudan • C/G-13907 North Kenya

  40. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2672153/bin/ukmss-4417-f0002.jpghttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2672153/bin/ukmss-4417-f0002.jpg

  41. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2672153/bin/ukmss-4417-f0001.jpghttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2672153/bin/ukmss-4417-f0001.jpg

  42. SNPs in anthropology • Useful in tracing relationships between populations, migration routes • Initially used mitochondrial DNA (16569 bp) • Maternal inheritance only • (Y chromosome gives paternal inheritance only) • Important argument for “out of Africa” theory of human origins and dispersal • Can choose non-selected regions, in contrast to previous work on blood groups, MHC haplotypes

  43. Migration routes into Asia and the Pacific based on SNPs http://i49.tinypic.com/2d0j2py.jpg

  44. DNA sequences and language groups • Proposal by L. L. Cavalli-Sforza • Showed consistency between trees based on genetic markers and trees based on linguistic groupings • Controversial! • In some cases, genomics has confirmed hypotheses of population affinity based on language similarity / dissimilarity • Basques are outliers in both genes and language

  45. Recommended reading Tomasz Kamusella The Politics of Language and Nationalism in Modern Central Europe  Palgrave Macmillan, 2008

  46. What happens after invasions? • Hungary invaded by Magyars in 896 AD. Country converted to speaking Uralic language • Rome fell to vandals in 476 AD but did NOT impose their language. (Perhaps recognising superiority of Italian culture – which their descendants don’t) • England invaded by Anglo-Saxons in about 5th century. Anglo-Saxon pushed Celtic languages to far reaches of British Isles + Brittany • Norman invasion of 1066 did NOT entirely replace Anglo-Saxon by French.

  47. Possible effects of SNPs • In protein-coding sequences • silent • missense • coding → stop codon • stop codon → coding • SNPs can → dysfunctional proteins • In splice sites • 15% of disease-causing mutations in human genome are point mutations in vicinity of mRNA splice junctions • In regulatory sequences

  48. What are possible effects of SNPs in coding sequences? • Change in amino acid • Example: sickle-cell anaemia • sense codon → stop codon • protein truncated • stop codon→ sense codon • protein extended

More Related