yaniv erlich hannon lab n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Yaniv Erlich Hannon Lab PowerPoint Presentation
Download Presentation
Yaniv Erlich Hannon Lab

Loading in 2 Seconds...

play fullscreen
1 / 16

Yaniv Erlich Hannon Lab - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

Compressed Genotyping. Yaniv Erlich Hannon Lab. Cold Spring Harbor Laboratory. Poster in a nutshell. Genotyping is the process of determining the genetic variation for a certain trait in an individual. It is one of the main diagnostic tools in medical genetics

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Yaniv Erlich Hannon Lab' - gavivi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
yaniv erlich hannon lab

Compressed

Genotyping

YanivErlich

Hannon Lab

Cold Spring Harbor Laboratory

erlich@cshl.edu

poster in a nutshell
Poster in a nutshell
  • Genotyping is the process of determining the genetic variation for a certain trait in an individual.
  • It is one of the main diagnostic tools in medical genetics
    • - Finding carriers for rare genetic diseases such as Cystic Fibrosis
    • - Tissue matching in organ donation
    • - Forensic DNA analysis
  • Until now - only serial genotyping is possible. This is expensive and tedious.
  • Taking advantage on the ‘signal sparsity’, we developed and tested a compressed genotyping framework.
slide3

Abstract

Significant volumes of knowledge have been accumulated in recent years linking subtle genetic variations to a wide variety of medical disorders from cystic fibrosis to mental retardation. Nevertheless, there are still great challenges in applying this knowledge routinely in the clinic, largely due to the relatively tedious and expensive process of DNA sequencing. Since the genetic polymorphisms that underlie these disorders are relatively rare in the human population, the presence or absence of a disease-linked polymorphism can be thought of as a sparse signal. Using methods and ideas from compressed sensing and group testing, we have developed a cost-effective reconstruction protocol, called "DNA Sudoku", to retrieve useful data. In particular, we have adapted our scheme to a recently developed class of high throughput DNA sequencing technologies, and assembled a mathematical framework that has some important distinctions from 'traditional' compressed sensing ideas in order to address different biological and technical constraints.

erlich@cshl.edu

the genotyping problem
The genotyping problem

Input: Thousands of specimens

Output: Genotype of each specimen

Genotype

genotyping as a sparse graph reconstruction
Genotyping as a sparse graph reconstruction
  • An example of carrier screen for Cystic Fibrosis. There are two allele nodes, the Wild Type (WT) and the and the Cystic Fibrosis mutation. Samples 1, 2, 3, 5 are WT, while specimen 4 is a carrier. The specimen labeled with ’X’ is affected and does not enter to the screen. Genotyping is equivalent of finding the edges in the graph.
  • THE GRAPH IS SPARSE
  • Number of carriers is very low
  • No affected individuals
  • The degree of every sample node is always two (human genome is diploid)

Genotyping is equivalent to reveal the edges of the bipartite graph

Samples

Alleles

slide6

The main idea – pooled processing

One could reveal the graph edges by DNA sequence each sample

- expensive, tedious, and slow

Better:

Pool the samples and then sequence the pools

erlich@cshl.edu

slide7

Mathematically speaking

Allele

  • 0 2
  • 0 2
  • 0 2
  • 1
  • 0 2

Specimen

Allele

1 0 1 1 1

1 1 0 1 0

1 1 0 0 1

1 7

1 5

0 6

Specimen

Pool

Pool

What the observer sees

The pooling design

A binary matrix (‘1’ – in the pool, ‘0’ – otherwise)

The biadjacency matrix of the graph

What the observer wants

erlich@cshl.edu

what is a good pooling design
What is a good pooling design

Trivial compressed sensing demands

Biological oriented requirements

We need a light-weight d-disjunct matrix

light chinese design
Light Chinese Design
  • Inputs:N (number of specimens)
  • Column Weight (robotics efforts)
  • Algorithm:
  • 1. Find W numbers {x1,x2,…,xw} such that:
  • Bigger than
  • Pairwise coprime
  • 2. Generate W modular equations:
  • 3. Construct the pooling matrix upon the modular equations
  • Output: Pooling matrix

The algorithm reaches the bound derived by Kautz & Singleton (1964)

decoding the genotyping results by belief propagation
Decoding the genotyping results by Belief Propagation

Specimens

Pools

A-priori biological information

Genotyping results

The pooled results can be decoded as using Belief Propagation

example of belief propagation
Example of Belief Propagation

2. I can’t be B

1.You can be either A, C, or D

Specimens

#1

Pools

A

B

C

D

A

C

D

#2

A

B

C

D

B

C

A

#3

A

B

C

D

A

B

C

D

Possible genotypes:

A

B

C

D

#4

A

B

C

3.Specimen #3, #6 and #7: One of you guys should be B

#5

A

B

C

D

A

B

C

D

A

B

C

D

#6

B

D

C

#7

A

B

C

D

Specimen is in a pool

A

B

C

D

03/06/09

simulation results
Simulation results

1000 specimens

W = 5

Total pools = 180

Number of carriers

real results biotechnology application
Real results – biotechnology application

40,000 specimens

W = 5

Total pools = 1900

references acknowledgments
References & Acknowledgments
  • Compressed Genotyping. Yaniv Erlich, Assaf Gordon, Michael Brand, Gregory J. Hannon & Partha P. Mitra. Submitted to IEEE Trans. Info. Theory. 2009.
  • DNA Sudoku - harnessing high-throughput sequencing for multiplexed specimen analysis. Yaniv Erlich, Kenneth Chang, Assaf Gordon, Roy Ronen, Oron Navon, Michelle Rooks & Gregory J. Hannon. Genome Research. 2009.

Lindsay-Goldberg Fellowship