Sequencing a genome
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

Sequencing a genome PowerPoint PPT Presentation


  • 127 Views
  • Uploaded on
  • Presentation posted in: General

Sequencing a genome. Definition. Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism. Basic problem. Genomes are large (typically millions or billions of base pairs)

Download Presentation

Sequencing a genome

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Sequencing a genome

Sequencing a genome


Definition

Definition

  • Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism


Basic problem

Basic problem

  • Genomes are large (typically millions or billions of base pairs)

  • Current technology can only reliably ‘read’ a short stretch – typically hundreds of base pairs


Elements of a solution

Elements of a solution

  • Automation – over the past decade, the amount of hand-labor in the ‘reads’ has been steadily and dramatically reduced

  • Assembly of the reads into sequences is an algorithmic and computational problem


A human drama

A human drama

  • There are competing methods of assembly

  • The competing – public and private – sequencing teams used competing assembly methods


Assembly

Assembly:

  • Putting sequenced fragments of DNA into their correct chromosomal positions


Sequencing a genome

BAC

  • Bacterial artificial chromosome: bacterial DNA spliced with a medium-sized fragment of a genome (100 to 300 kb) to be amplified in bacteria and sequenced.


Contig

Contig

  • Contiguous sequence of DNA created by assembling overlapping sequenced fragments of a chromosome (whether natural or artificial, as in BACs)


Cosmid

Cosmid

  • DNA from a bacterial virus spliced with a small fragment of a genome (45 kb or less) to be amplified and sequenced


Directed sequencing

Directed sequencing

  • Successively sequencing DNA from adjacent stretches of chromosome


Draft sequence

Draft sequence

  • Sequence with lower accuracy than a finished sequence; some segments are missing or in the wrong order or orientation


Sequencing a genome

EST

  • Expressed sequence tag: a unique stretch of DNA within a coding region of a gene; useful for identifying full-length genes and as a landmark for mapping


Sequencing a genome

Exon

  • Region of a gene’s DNA that encodes a portion of its protein; exons are interspersed with noncoding introns


Genome

Genome

  • The entire chromosomal genetic material of an organism


Intron

Intron

  • Region of a gene’s DNA that is not translated into a protein


Kilobase kb

Kilobase (kb)

  • Unit of DNA equal to 1000 bases


Locus

Locus

  • Chromosomal location of a gene or other piece of DNA


Megabase mb

Megabase (mb)

  • Unit of DNA equal to 1 million bases


Sequencing a genome

PCR

  • Polymerase chain reaction: a technique for amplifying a piece of DNA quickly and cheaply


Physical map

Physical map

  • A map of the locations of identifiable markers spaced along the chromosomes; a physical map may also be a set of overlapping clones


Plasmid

Plasmid

  • Loop of bacterial DNA that replicates independently of the chromosomes; artificial plasmids can be inserted into bacteria to amplify DNA for sequencing


Regulatory region

Regulatory region

  • A segment of DNA that controls whether a gene will be expressed and to what degree


Repetitive dna

Repetitive DNA

  • Sequences of varying lenths that occur in multiple copies in the genome; it represents much of the genome


Restriction enzyme

Restriction enzyme

  • An enzyme that cuts DNA at specific sequences of base pairs


Sequencing a genome

RFLP

  • Restriction fragment length polymorphism: genetic variation in the length of DNA fragments produced by restriction enzymes; useful as markers on maps


Scaffold

Scaffold

  • A series of contigs that are in the right order but are not necessarily connected in one continuous stretch of sequence


Shotgun sequencing

Shotgun sequencing

  • Breaking DNA into many small pieces, sequencing the pieces, and assembling the fragments


Sequencing a genome

STS

  • Sequence tagged site: a unique stretch of DNA whose location is known; serves as a landmark for mapping and assembly


Sequencing a genome

YAC

  • Yeast artificial chromosome: yeast DNA spliced with a large fragment of a genome (up to 1 mb) to be amplified in yeast cells and sequenced


Readings

Readings

  • Myers, “Whole Genome DNA Sequencing,” http://www.cs.arizona.edu/people/gene/PAPERS/whole.IEEE.pdf

  • Venter, et al, “The Sequence of the Human Genome,” Science, 16 Feb 2001, Vol. 291 No 5507, 1304 (parts 1 & 2)

  • Waterston, Lander, Sulston, “On the sequencing of the human genome,” PNAS, March 19, 2002, Vol 99, no 6, 3712-3716

  • Myers, et.al., “On the sequencing and assembly of the human genome,” www.pnas.org/cgi/doi/10.1073/pnas.092136699


Hierarchical sequencing

Hierarchical sequencing

  • Create a high-level physical map, using ESTs and STSs

  • Shred genome into overlapping clones

  • Multiply clones in BACs

  • ‘shotgun’ each clone

  • Read each ‘shotgunned’ fragment

  • Assemble the fragments


Physical map1

Physical map


Whole genome sequencing wgs

Whole genome sequencing (WGS)

  • Make multiple copies of the target

  • Randomly ‘shotgun’ each target, discarding very big and very small pieces

  • Read each fragment

  • Reassemble the ‘reads’


Hierarchical v whole genome

Hierarchical v. whole-genome


The fragment assembly problem

The fragment assembly problem

  • Aim: infer the target from the reads

  • Difficulties –

    • Incomplete coverage. Leaves contigs separated by gaps of unknown size.

    • Sequencing errors. Rate increases with length of read. Less than some .

    • Unknown orientation. Don’t know whether to use read or its Watson-Crick complement.


Scaling and computational complexity

Scaling and computational complexity

  • Increasing size of target G.

    • 1990 – 40kb (one cosmid)

    • 1995 – 1.8 mb (H. Influenza)

    • 2001 – 3,200 mb (H. sapiens)


The repeat problem

The repeat problem

  • Repeats

    • Bigger G means more repeats

    • Complex organisms have more repetitive elements

    • Small repeats may appear multiple times in a read

    • Long repeats may be bigger than reads (no unique region)


Sequencing a genome

Gaps

  • Read length LR hasn’t changed much

  •  = LR /G gets steadily smaller

  • Gaps ~ Re- R (Waterman & Lander)


How deep must coverage be

How deep must coverage be?


Double barreled shotgun sequencing

Double-barreled shotgun sequencing

  • Choose longer fragments (say, 2 x LR)

  • Read both ends

  • Such fragments probably span gaps

  • This gives an approximate size of the gap

  • This links contigs into scaffolds


Genomic results

Genomic results


Hgsc v celera results

HGSC v Celera results


To do or not to do

To do or not to do?

  • “The idea is gathering momentum. I shiver at the thought.” – David Baltimore, 1986

  • “If there is anything worth doing twice, it’s the human genome.” – David Haussler, 2000


Public or private

Public or private?

  • “This information is so important that it cannot be proprietary.” – C Thomas Caskey, 1987

  • “If a company behaves in what scientists believe is a socially responsible manner, they can’t make a profit.” – Robert Cook-Deegan, 1987


Hw for feb 17

HW for Feb 17

  • Comment on these assertions (500-1000 words):

    • WLS – “Our analysis indicates that the Celera paper provides neither a meaningful test of the WGS approach nor an independent sequence of the human genome.”

    • Venter – “This conclusion is based on incorrect assumptions and flawed reasoning.”


  • Login