Lessons learnt from the 1000 genomes project about sequencing in populations
Download
1 / 36

Lessons learnt from the 1000 Genomes Project about sequencing in populations - PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on

Lessons learnt from the 1000 Genomes Project about sequencing in populations. Gil McVean Wellcome Trust Centre for Human Genetics and Department of Statistics, University of Oxford. Some questions. What has the 1000 Genomes Project told us about how to sequence (in) populations

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Lessons learnt from the 1000 Genomes Project about sequencing in populations' - tobias-gregory


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lessons learnt from the 1000 genomes project about sequencing in populations

Lessons learnt from the 1000 Genomes Project about sequencing in populations

Gil McVean

Wellcome Trust Centre for Human Genetics and Department of Statistics, University of Oxford


Some questions
Some questions sequencing in populations

  • What has the 1000 Genomes Project told us about how to sequence (in) populations

  • What has the 1000 Genomes Project told us about populations


Samples for the 1000 genomes project

CEU sequencing in populations

FIN

GBR

CHB

TSI

JPT

IBS

CDX

CHS

YRI

GWB

KHV

LWK

GHN

MAB

Samples for the 1000 Genomes Project

ASW

AJM

ACB

MXL

PUR

CLM

PEL

Samples from S. Asia

Major population groups comprised of subpopulations of c. 100 each


The role of the 1000g project in medical genetics
The role of the 1000G Project in medical genetics sequencing in populations

  • A catalogue of variants

    • 95% of variants at 1% frequency in populations of interest

  • A representation of ‘normal’ variation

  • A set of haplotypes for imputation into GWAS

  • A training ground for sequencing/statistical/computational technologies


Samples for the 1000 Genomes Project: sequencing in populations Pilot

CEU

CHB

TSI*

JPT

CHS*

YRI

LWK*

*Exon pilot only


Population scale genome sequencing
Population-scale genome sequencing sequencing in populations

Haplotypes

2x

10x


What has the project generated

What has the project generated? sequencing in populations


15 million snps 50 of them novel
>15 million SNPs, >50% of them novel sequencing in populations

dbSNP entries increased by 70%



A robust and modular pipeline for analysis of population scale sequence data
A robust and modular pipeline for analysis of population-scale sequence data


An efficient format for storing aligned reads and a set of tools to manipulate and view the files
An efficient format for storing aligned reads and a set of tools to manipulate and view the files

  • SAM/BAM format for storing (aligned) reads

Bioinformatics (2009) http://samtools.sourceforge.net


An information-rich format for storing generic haplotype/genotype data and tools for manipulating the files

http://vcftools.sourceforge.net


An understanding of the rare functional variant load carried by individuals
An understanding of the ‘rare functional variant load’ carried by individuals

c. 250 LOF / person

c. 75 HGMD DM


Ush2a
USH2A carried by individuals

  • Mutations cause with Usher syndrome

  • 66 missense variants in dbSNP

  • 2/3 detected in 1000 Genomes Pilot

  • One HGMD ‘disease-causing’ variant homozygous in 3 YRI

    • Other reports indicate this is not a real disease-causing variant


Samples for the 1000 genomes project phase1
Samples for the 1000 Genomes Project: Phase1 carried by individuals

CEU

FIN

GBR

CHB

ASW

TSI

JPT

CHS

YRI

MXL

PUR

LWK

CLM



Lesson 1 the low coverage model works for variant discovery

Lesson 1. carried by individualsThe low-coverage model works for variant discovery


A near complete record of common variants
A near complete record of common variants carried by individuals

CEU


Lesson 2 the low coverage model works for snp genotyping

Lesson 2. carried by individualsThe low coverage model works for SNP genotyping


A set of accurate genotypes haplotypes
A set of accurate genotypes/haplotypes carried by individuals

CEU


Lesson 3 the genome has a large grey area where variant calling is hard

Lesson 3. carried by individualsThe genome has a large grey area where variant calling is hard


Lesson 4 joint calling of different variant types substantially improves the quality of calls

Lesson 4. carried by individualsJoint calling of different variant types substantially improves the quality of calls


Lesson 5 managing uncertainty is important

Lesson 5. carried by individualsManaging uncertainty is important


Lesson 6 data visualisation is key

Lesson 6. carried by individualsData visualisation is key


Lessons learnt about populations

Lessons learnt about populations carried by individuals



Spatial heterogeneity in non-genetic risk can differentially confound association studies for rare and common variants

Iain Mathieson


Thanks to the many
Thanks to the many... confound association studies for rare and common variants

  • Steering committee

    • Co-chairs: Richard Durbin and David Altshuler

  • Samples and ELSI Committee

    • Co-chairs: AravindaChakravarti and LeenaPeltonen

  • Data Production Group

    • Co-chairs: Elaine Mardis and Stacey Gabriel

  • Analysis Group

    • Co-Chairs: Gil McVean and Goncalo Abecasis

    • Subgroups in gene-targeted sequencing (Richard Gibbs) and population genetics (Molly Przeworski)

  • Structural Variation Group

    • Co-chairs: Matt Hurles, Charles Lee and Evan Eichler

  • DCC

    • Co-Chairs: Paul Flicek and Steve Sherry


ad