hierarchical sequencing l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Hierarchical Sequencing PowerPoint Presentation
Download Presentation
Hierarchical Sequencing

Loading in 2 Seconds...

play fullscreen
1 / 19

Hierarchical Sequencing - PowerPoint PPT Presentation


  • 1014 Views
  • Uploaded on

Hierarchical Sequencing. a BAC clone. map. Hierarchical Sequencing Strategy. Obtain a large collection of BAC clones Map them onto the genome (Physical Mapping) Select a minimum tiling path Sequence each clone in the path with shotgun Assemble Put everything together. genome.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Hierarchical Sequencing' - JasminFlorian


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
hierarchical sequencing strategy

a BAC clone

map

Hierarchical Sequencing Strategy
  • Obtain a large collection of BAC clones
  • Map them onto the genome (Physical Mapping)
  • Select a minimum tiling path
  • Sequence each clone in the path with shotgun
  • Assemble
  • Put everything together

genome

hierarchical sequencing strategy3

a BAC clone

map

Hierarchical Sequencing Strategy
  • Obtain a large collection of BAC clones
  • Map them onto the genome (Physical Mapping)
  • Select a minimum tiling path
  • Sequence each clone in the path with shotgun
  • Assemble
  • Put everything together

genome

methods of physical mapping
Methods of physical mapping

Goal:

Make a map of the locations of each clone relative to one another

Use the map to select a minimal set of clones to sequence

Methods:

  • Hybridization
  • Digestion
1 hybridization
1. Hybridization

Short words, the probes, attach to complementary words

  • Construct many probes
  • Treat each BAC with all probes
  • Record which ones attach to it
  • Same words attaching to BACS X, Y  overlap

p1

pn

2 digestion
2. Digestion

Restriction enzymes cut DNA where specific words appear

  • Cut each clone separately with an enzyme
  • Run fragments on a gel and measure length
  • Clones Ca, Cb have fragments of length { li, lj, lk }  overlap

Double digestion:

Cut with enzyme A, enzyme B, then enzymes A + B

the walking method
The Walking Method
  • Build a very redundant library of BACs with sequenced clone-ends (cheap to build)
  • Sequence some “seed” clones
  • “Walk” from seeds using clone-ends to pick library clones that extend left & right
slide10

Some Terminology

insert a fragment that was incorporated in a

circular genome, and can be copied

(cloned)

vector the circular genome (host) that

incorporated the fragment

BACBacterial Artificial Chromosome, a type

of insert–vector combination, typically

of length 100-200 kb

read a 500-900 long word that comes out of

a sequencing machine

coveragethe average number of reads (or

inserts) that cover a position in the

target DNA piece

shotgun the process of obtaining many reads

sequencing from random locations in DNA, to

detect overlaps and assemble

whole genome shotgun sequencing

cut many times at random

Whole Genome Shotgun Sequencing

genome

plasmids (2 – 10 Kbp)

forward-reverse paired reads

known dist

cosmids (40 Kbp)

~800 bp

~800 bp

fragment assembly
Fragment Assembly

Given N reads…

Where N ~ 30 million…

We need to use a linear-time algorithm

steps to assemble a genome
Steps to Assemble a Genome

Some Terminology

read a 500-900 long word that comes

out of sequencer

mate pair a pair of reads from two ends

of the same insert fragment

contig a contiguous sequence formed

by several overlapping reads

with no gaps

supercontig an ordered and oriented set

(scaffold) of contigs, usually by mate

pairs

consensus sequence derived from the

sequene multiple alignment of reads

in a contig

1. Find overlapping reads

2. Merge some “good” pairs of reads into longer contigs

3. Link contigs to form supercontigs

4. Derive consensus sequence

..ACGATTACAATAGGTT..

1 find overlapping reads
1. Find Overlapping Reads

(read, pos., word, orient.)

aaactgcag

aactgcagt

actgcagta

gtacggatc

tacggatct

gggcccaaa

ggcccaaac

gcccaaact

actgcagta

ctgcagtac

gtacggatc

tacggatct

acggatcta

ctactacac

tactacaca

(word, read, orient., pos.)

aaactgcag

aactgcagt

acggatcta

actgcagta

actgcagta

cccaaactg

cggatctac

ctactacac

ctgcagtac

ctgcagtac

gcccaaact

ggcccaaac

gggcccaaa

gtacggatc

gtacggatc

tacggatct

tacggatct

tactacaca

aaactgcagtacggatct

aaactgcag

aactgcagt

gtacggatct

tacggatct

gggcccaaactgcagtac

gggcccaaa

ggcccaaac

actgcagta

ctgcagtac

gtacggatctactacaca

gtacggatc

tacggatct

ctactacac

tactacaca

1 find overlapping reads16

T GA

TACA

| ||

||

TAGA

TAGT

1. Find Overlapping Reads
  • Find pairs of reads sharing a k-mer, k ~ 24
  • Extend to full alignment – throw away if not >98% similar

TAGATTACACAGATTAC

|||||||||||||||||

TAGATTACACAGATTAC

  • Caveat: repeats
    • A k-mer that occurs N times, causes O(N2) read/read comparisons
    • ALU k-mers could cause up to 1,000,0002 comparisons
  • Solution:
    • Discard all k-mers that occur “too often”
      • Set cutoff to balance sensitivity/speed tradeoff, according to genome at hand and computing resources available
1 find overlapping reads17
1. Find Overlapping Reads

Create local multiple alignments from the overlapping reads

TAGATTACACAGATTACTGA

TAGATTACACAGATTACTGA

TAG TTACACAGATTATTGA

TAGATTACACAGATTACTGA

TAGATTACACAGATTACTGA

TAGATTACACAGATTACTGA

TAG TTACACAGATTATTGA

TAGATTACACAGATTACTGA

1 find overlapping reads18
1. Find Overlapping Reads
  • Correcterrors using multiple alignment

TAGATTACACAGATTACTGA

TAGATTACACAGATTACTGA

TAGATTACACAGATTACTGA

TAGATTACACAGATTACTGA

TAGATTACACAGATTATTGA

TAG-TTACACAGATTATTGA

TAGATTACACAGATTACTGA

TAGATTACACAGATTACTGA

TAG-TTACACAGATTACTGA

TAG-TTACACAGATTATTGA

insert A

correlated errors—

probably caused by repeats

 disentangle overlaps

replace T with C

TAGATTACACAGATTACTGA

TAGATTACACAGATTACTGA

TAGATTACACAGATTACTGA

In practice, error correction removes

up to 98% of the errors

TAG-TTACACAGATTATTGA

TAG-TTACACAGATTATTGA

2 merge reads into contigs
2. Merge Reads into Contigs
  • Overlap graph:
    • Nodes: reads r1…..rn
    • Edges: overlaps (ri, rj, shift, orientation, score)

Reads that come

from two regions of

the genome (blue

and red) that contain

the same repeat

Note:

of course, we don’t

know the “color” of

these nodes