slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
DNA sequencing PowerPoint Presentation
Download Presentation
DNA sequencing

Loading in 2 Seconds...

play fullscreen
1 / 91

DNA sequencing - PowerPoint PPT Presentation


  • 286 Views
  • Uploaded on

DNA sequencing. Part 1: Chemistry, instrumentation and data analysis Part 2: Large-scale operations, comparative sequencing. Part 3: Sequencing analysis, variation analysis. Abbrev 02/03/05. DNA sequencing: Importance. Basic blueprint for life; Aesthetics. Gene and protein. Function

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'DNA sequencing' - Leo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

DNA sequencing

  • Part 1: Chemistry, instrumentation and data analysis
  • Part 2: Large-scale operations, comparative sequencing.
  • Part 3: Sequencing analysis, variation analysis.
  • Abbrev 02/03/05
dna sequencing importance
DNA sequencing: Importance
  • Basic blueprint for life; Aesthetics.
  • Gene and protein.
    • Function
    • Structure
    • Evolution
  • Genome-based diseases- “inborn errors of metabolism.”
    • Genetic disorders
    • Genetic predispositions to infection
    • Diagnostics
    • Therapies
dna sequencing methodologies ca 1977
Maxam-Gilbert

base modification by general and specific chemicals.

depurination or depyrimidination.

single-strand excision.

not amenable to automation

Sanger

DNA replication.

substitution of substrate with chain-terminator chemical.

more efficient

automation??

DNA sequencing methodologies: ca. 1977!
dna sequencing biochemistry

O

O

O

O

O

O

P

P

P

OH

OH

OH

DNA sequencing: biochemistry

5’

purine

or

pyrimidine

N

HO

C

O

purine

or

pyrimidine

O

N

C

O

O

O

P

OH

3’

OH

dna sequencing sanger dideoxy method i

O

O

O

O

O

O

P

P

P

OH

OH

OH

DNA sequencing: Sanger dideoxy method I

purine

or

pyrimidine

N

HO

C

O

dideoxyribonucleoside triphosphate

(ddNTP)

H

dna sequencing sanger ii

O

O

O

O

O

O

P

P

P

OH

OH

OH

DNA sequencing: Sanger II

purine

or

pyrimidine

N

HO

C

O

purine

or

pyrimidine

O

chain

termination

method

N

C

O

O

O

P

OH

H

dna sequencing chemistry1
DNA sequencing: Chemistry

template + primers + polymerase +label at?

1

dCTP

dTTP

dGTP

dATP

ddATP*

2

dCTP

dTTP

dGTP

dATP

ddGTP*

3

dCTP

dTTP

dGTP

dATP

ddTTP*

4

dCTP

dTTP

dGTP

dATP

ddCTP*

electrophoresis

A•T

G•C

A•T

T•A

C•G

T•A

G•C

G•C

A•T

G•C

T•A

T•A

C•G

T•A

G•C

A•T

extension

dna sequencing chemistry2
DNA sequencing: Chemistry

template + polymerase +

1

dCTP

dTTP

dGTP

dATP

ddATP

primer

2

dCTP

dTTP

dGTP

dATP

ddGTP

primer

3

dCTP

dTTP

dGTP

dATP

ddTTP

primer

4

dCTP

dTTP

dGTP

dATP

ddCTP

primer

electrophoresis

A•T

G•C

A•T

T•A

C•G

T•A

G•C

G•C

A•T

G•C

T•A

T•A

C•G

T•A

G•C

A•T

extension

semi automated fluorescent dna sequencing
Semi-automated fluorescent DNA sequencing
  • Fred Sanger et. al., 1977.
  • Walter Gilbert et. al., 1977.
  • Leroy Hood et. al. 1986.
  • Applied Biosystems, Inc.
  • DuPont Company.
dna sequencing upgrade second iteration terminator label
DNA sequencing: upgrade, second iteration, terminator-label
  • Disadvantages of primer-labels:
    • four reactions
    • tedious
    • limited to certain regions, custom oligos or
    • limited to cloned inserts behind ‘universal’ priming sites.
  • Advantages:
  • Solution:
    • fluorescent dye terminators
dna sequencing chemistry3
DNA sequencing: Chemistry

template + polymerase +

dCTP

dTTP

dGTP

dATP

ddATP

ddGTP

ddTTP

ddCTP

electrophoresis

A•T

G•C

A•T

T•A

C•G

T•A

G•C

G•C

A•T

G•C

T•A

T•A

C•G

T•A

G•C

A•T

extension

abi series 370 373 and 377
ABI series: 370, 373 and 377
  • semi-automated
  • “best” pre- and post-
  • higher throughput operations.
  • bioinformatics limitations, ‘scuze me- “opportunities.”
genome sequencing strategies
genome sequencing strategies
  • Shotgun
  • Directed primer walks
  • Modified directed primer walks
sequencing strategies
Sequencing strategies

Whole genome

Also on a smaller scale: 1. “Island walking” and 2. Primer walking.

rapid re sequencing of human ad1 time trial
Rapid re-sequencing of human Ad1: Time trial.

Have sequence of Ad 1.

In theory, have a minimally tiled set of PCR primers to cover entire 36,001 base genome.

In theory, have a minimally tiled set of sequencing primers as well.

Want draft sequence in a minimal time, including primer delivery from a vendor.

In practice design two parallel sets of minimally tiled PCR primers and amplify two sets.

In practice, assume 750 base reads--> 48 primers, one direction.

Compare with consensus: Determine accuracy, timing and evaluate operation.

1 36,001

115 7,315

7,300 14,500

14,400 21,600

21,500 28,700

28,600 35,885

slide26

Custom primer walks and “island” hopping

  • Have scaffold of generic genome: related or compiled.
  • Have archived “islands of sequences” (lg, med, sm)- from other research interests.
  • Generate “in-bound” primers to re-sequence equivalents and known features, e.g., 3’-ITR.
  • Use custom “out-bound” primers to walk across “inter-island” sequences (PCR and sequencing.
  • Collect “1st +” draft genomic sequence as round 1.
  • Iterative walks to complete “2+1” consensus, with error rate 1/10,000 bases.
slide27

Target: HAdV4

  • For 36,000 bases, need 90 primers for 1x coverage (1st draft) and 270 primers for 3x coverage (finished).
  • Have from GenBank: 10 “islands” @ 30%= 10,883 bases,
    • calling for 27x2= 54 primers for complementing coverage.
  • Theory (if continuous sequence): 36,000-10,883= 25,117 bases.
    • At 400 bases per read, need 63 primers for 1x coverage, or 126 for complementing coverage.
  • Practice: 10 “islands” @ 30%= 10,883 bases, 80 primers.
  • Example: “Island 1” is 149 bases.
    • 1 fragment at 400 bases/read.
    • 2 primers for 1x coverage.
    • “Terminal island,” need only 1 “outbound” primer.
    • Total of (1x2)+1= 3 primers.
  • Example: “Island 2” is 2042 bases.
    • 5 fragments at 400 bases/read.
    • “Internal island,” need 2 “outbound” primers.
    • Total of (5x2)+2= 12 primers.
definition of tiled set of pcr primers data
Definition of tiled set of PCR primers: Data.

C

A

B

D

G

E

F

H

PCR fragments

“B”

“C”

“D”

“E”

dna sequencing computation
Input from sequencer

peak intensities

Output to user

DNA sequence

DNA sequencing: Computation
  • normalize intensities
  • apply mobility corrections
  • predict bands
  • call bases
applications dna sequencing
Applications DNA sequencing
  • Whole genome analysis
  • Comparative genomics
  • Applications to subfields
dna sequencing
DNA sequencing
  • HighER throughput
dna sequencing technology
DNA sequencing technology
  • Manual.
  • ABI 370s series.
  • DuPont “Genesis.”
  • Capillary array: Hitachi, ABI, Amersham...
  • Ultrathin horizontal: GeneSys Tech. (MJResearch), Whitehead Inst., E. Yeung.
  • Thin channel.
  • “ABI” 310, 3100, 3710…….
shimadzu ltd
Shimadzu, Ltd.
  • NEW ORLEANS, March 19, 2002. PittCon.
  • Faster and more economical DNA Sequencer.
  • 10 times faster and 90 percent cheaper to run than current state-of-the-art.
  • GenoMEMS, MA spinoff that has developed a microfabrication technology, based on Whitehead Inst. technology.
  • Microelectromechanical system, or MEMS, technology:microfabricated electrical and mechanical components
  • Five million bases per day.
  • Readlengths of 800 bases.
  • Target 2003.
slide53

Genome characterization

  • Align DNA sequence with archived sequences.
  • Annotate DNA features, e.g., RE sites, GC sites, replication and transcription factor binding sites.
  • Annotate ORFs.
  • Annotate genes and proteins.
  • Phylogenetic analyses of genes.
  • Whole genome comparisons.
  • Phylogenetic analyses of genomes.
  • Identify cellular homologues or “ancient history” -horizontal transfer.
genome sequence annotation
Genome Sequence Annotation.
  • Annotation flowchart.
  • Summary of findings.
  • Comparison of genome sequences.
slide55

From the sequencing projects:

Biological features in sequence ?

ATG

TAG

TAA

TGA

POLY A SIGNAL

PROMOTER

INTRON

GT

AG

EXON

EXON

slide56

Genome sequence annotation:

  • (M. Zorn, Berkeley, 2002)
    • Extraction, definition and interpretation of features in the genome

sequence by integrating computational tools and biological knowledge.

    • “Proofread” the sequence: correct miscalls. Sequence data needs to be “cleaned up” for chip design.
adenoviruses
Adenoviruses:

Non-enveloped icosahedral viruses .

Multiply in the host nucleus.

Linear double-stranded DNA genome, 26-45bp in size.

Infect most vertabrates from fish to humans.

Human adenoviruses Mastadenovirus.

51 human serotypes divided into six sub-genera

(Group A-F).

  • HAdB1: Ads 3, 7, 16, 21. (respiratory infections)
  • HAdB2: Ads 11, 14, 34, 35, 50. (kidney and UT infections

except, 11a and 14)

  • HAdE: Ad 4. (respiratory infections)
slide58

From Stone et al, 2003.

Transcription units

Early

Intermediate

Late

slide59

Gene annotation of adenovirus genome: Basic

ORFs

GLIMMER2

Artemis

Start/Stop Codon

Verification

RBSFinder

Refined ORFs

Translated frames

Artemis: six frames

translation

GenBank

Non-redudant

Protein databases

BLASTP

Sequence

Alignments

CLUSTALW

GENES: name, CDS (Splice sites), MW

advanced detailed annotation of genes
Advanced: Detailed annotation of genes

GenBank: E4 Superfamily: regions 1 and 2

Join: region 1, region2:117~306 nt in between

GenBank E4 Superfamily:

17 KD, 20 KD, 24KD, 27 KD

CLUSTALW

Artemis:six frame translation

Annotated Human type 1 adenovirus E4 genes:

Spliced from 2 to 3 exons

slide61

5’

E4 27K

5’

E4 20K

5’

E4 17K

slide62

MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFPMAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP

VVRQASNV#MFFFVILFCV#CRNPQTCLREKWCLFLWWFRNLPAFICMSMTTMCLLFCARLCLIF*AAPCILYRRPCNKLT+GLRWLA+LRVCVS#SVWVLLSWFLAGKWPRWSVQTCTI

MFSWPCEGTYGIAVFLLMFRF*ILYRSVRNLNFCNHDSLLEAEGGGRSGADFYNGRT#YSGFA+RHIDKVAR*KLFGHG*RCWNVYRGDSP*RV+PLRPLGREGSLPFGSHCATSYKCHY

LFFGCRV*PRHRRGARSLNRSSF*GFG#SFGIKKKKTWFFQLFPLLPCVTRRTNV+VGWVWLILRWWMLSGQRRMKEFT+NPKPGGAWML*ESGYTTTTTQSELSDETGDADLFVTPAPG

FASGNMTTSGVPFGMTLRPTRSRLSRRTPYSRDRLPPFETETRATILEDHPLLPECNTLTMHNVSYVRGLPCSVGFTLIQEWVVPWDMVLTREELVILRKCMHVCLCCANIDIMTSMMIH

GYESWALHCHCSSPGSLQCIAGGQVLASWFRMVVDGAMFNQRFIWYREVVNYNMPKEVMFMSSVFMRGRHLIYLRLWYDGHVGSVVPAMSFGYSALHCGILNNIVVLCCSYCADLSEIRV

RCCARRTRRLMLRAVRIIAEETTAMLYSCRTERRRQQFIRALLQHHRPILMHDYDSTPM

Had1

ORF

Had1 27K

1st exon

MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP

V

Had1 27K

2nd exon

ALPDFLSSTLHFISPPMQQAYIGATLVSIAPSMRVIISVGSFVMVPGGEVAALVRADLHDYVQLALRRDLRDRGIFVNVPLLNLIQVCEEPEFLQS

Had1 27K

3rd exon

VGIAYLLLRQRPALPYWRIIRCCPNVTL

Had2 27K

MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP

VALPDFLSSTLHFISPPMQQAYIGATLVSIAPSMRVIISVGSFVMVPGGEVAALVRADLHDYVQLALRRDLRDRGIFVNVPLLNLIQVCEEPEFLQSVGIAYLLLRQRPALPYWRIIRCC

PNVTL

Had2

ORFs

MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP

VVRQASNV#MFFFVILFCV#CRNPQTCLREKWCLFLWWFRNLPAFICMSMTTMCLLFCARLCLIF*AAPCILYRRPCNKLT+GLRWLA+LRVCVS#SVWVLLSWFLAGKWPRWSVQTCTI

MFSWPCEGTYGIAVFLLMFRF*ILYRSVRNLNFCNHDSLLEAEGGGRSGADFYNGRT#YSGFA+RHIDKVAR*KLFGHG*RCWNVYRGDSP*RV+PLRPLGREGSLPFGSHCATSYKCHY

LFFGCRV*PRHRRGARSLNRSSF*GFG#SFGIKKKKTWFFQLFPLLPCVTRRTNV+VGWVWLILRWWMLSGQRRMKEFT+NPKPGGAWML*ESGYTTTTTQSELSDETGDADLFVTPAPG

FASGNMTTSGVPFGMTLRPTRSRLSRRTPYSRDRLPPFETETRATILEDHPLLPECNTLTMHNVSYVRGLPCSVGFTLIQEWVVPWDMVLTREELVILRKCMHVCLCCANIDIMTSMMIH

GYESWALHCHCSSPGSLQCIAGGQVLASWFRMVVDGAMFNQRFIWYREVVNYNMPKEVMFMSSVFMRGRHLIYLRLWYDGHVGSVVPAMSFGYSALHCGILNNIVVLCCSYCADLSEIRV

RCCARRTRRLMLRAVRIIAEETTAMLYSCRTERRRQQFIRALLQHHRPILMHDYDSTPM

Had2 27K

1stexon

MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP

V

Had2 27K

2nd exon

ALPDFLSSTLHFISPPMQQAYIGATLVSIAPSMRVIISVGSFVMVPGGEVAALVRADLHDYVQLALRRDLRDRGIFVNVPLLNLIQVCEEPEFLQS

Had2 27K

3rd exon

VGIAYLLLRQRPALPYWRIIRCCPNVTL

MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP

VALPDFLSSTLHFISPPMQQAYIGATLVSIAPSMRVIISVGSFVMVPGGEVAALVRADLHDYVQLALRRDLRDRGIFVNVPLLNLIQVCEEPEFLQSVGIAYLLLRQRPALPYWRIIRCC

PNVTL

Had2_27K

Had1

Had2

Had1

Had2

Had1

Had2

MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP

MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP

VALPDFLSSTLHFISPPMQQAYIGATLVSIAPSMRVIISVGSFVMVPGGEVAALVRADLHDYVQLALRRDLRDRGIFVNVPLLNLIQVCEEPEFLQSVGIAYLLLRQRPALPYWRIIRCC

VALPDFLSSTLHFISPPMQQAYIGATLVSIAPSMRVIISVGSFVMVPGGEVAALVRADLHDYVQLALRRDLRDRGIFVNVPLLNLIQVCEEPEFLQSVGIAYLLLRQRPALPYWRIIRCC

PNVTL

PNVTL

two annotation approaches to hadv1
Two annotation approaches to HAdV1

Based on Ad 2 annotation

Generic annotation plus advanced

slide66

Global tools for whole genome analyses

  • Databases and data streams “readily” available.
  • Data mining opportunities: “added value.”
  • Limited tools in tool set, especially whole genome comparisons: MAP, GeneOrder and CoreGenes.
  • Non-available or non-optimal tools: Automated annotation, etc.
  • These whole genome analysis tools have value for the EOS project, in particular the PCR-based assays and the microarray “re-sequencing” assays.
slide70

FLAG Ad 1 vs 2 vs 5

  • Ad 1 vs Ad 2
  • Ad 1 vs Ad 5
  • Ad 2 vs Ad 5
slide71

GeneOrder flowchart

Get GenBank file from NCBI website

Problem during

process

Yes

Error message

No

Stop

Remove unnecessary

information and save

Convert to FASTA format

Convert to database format for BLASTP

Break query file into single query.

Save each query in a temporary file.

BLASTP against database.

Get BLASTP results based on selected ranges

Extract and print table/graph

slide72

GeneOrder analysis: Example

  • Manually plot with MS-Excel.
  • Each point is a coding gene.
  • Co-linear arrangements suggest synteny.
  • Several regions of genomic rearrangement events within the genomes of the two chloroplasts.
  • Rearrangements include flipping of entire set of genes.
  • Two versions have been developed: GO1 and 2.
  • Ongoing work include recoding for megabase genomes, which have additional value.
slide78

Conserved genes of poxviruses

GeneOrder identifies similar genes in two genomes

Organize common genes in five genomes (genera) as “Alphabet”

Add other genes based on additional information

Use Advanced BLAST to check the Alphabet

Use PSI-BLAST with several iterations to check the Alphabet

Scan entire NCBI protein database using conserved

profiles to ensure that all the conserved proteins have been extracted

Compare Alphabet with experimental TS mutant data to determine

the essential genes for pox viruses

orthologous gene locator
Orthologous gene locator
  • Develop software tool to characterize genomes globally.
  • Characterize genomes by identifying orthologous genes.
  • Identify paralogs.
  • Characterize unknown genes by identifying orthologs.
  • Rapid automated comparisons of genomes.
  • Identify “alphabet” of essential genes.
  • “CoreGenes.”
  • In general, high BLAST may not be orthologous/homologous
applications of coregenes to eos affy chip design
Applications of CoreGenes to EOS Affy chip design
  • HNC, San Diego, has a DTRA contract to build a software tool to determine sequences common to bacterial pathogens, allowing for identification of probes and primers: “BugID.”
  • HNC has been tasked to reformat “BugID” for examining virus genomes, which do have “core” genes, conserved at the amino acid but not necessarily at the nucleotide level. One preliminary exercise is to develop software to identify essential and related proteins.
  • “CoreGenes” from GMU already performs this function. It presents a table of “core” and presumably essential genes from families of organisms.
  • “CoreGenes” is under continued development. One feature is to present tables of related, slightly related and unrelated genes.
  • This has value in identifying probes and primers for assays such as microarrays.
automated annotation
Automated annotation

Transform newly determined DNA sequence into linear array:

  • Input: DNA sequence.
  • Discovery: ORFs analysis.
  • Discovery: “Gene finder” analyses, e. g., GRAIL, etc.
  • Input: Related genomes.
  • Discovery: GeneOrder (pairwise); CoreGenes- collect “gaps,” catalog and re-analyze “gaps” as above.
  • Discovery: BLAST- tBLASTx, BLASTP, Advanced BLAST, Psi BLAST, etc.
  • Input: “Loose” genes, proprietary genes.
  • Discovery: Annot. with protein domain, features, pattern etc. dbs.
  • Process: Merge newly generated databases.
  • Ordering: Order genes with respect to genomic locations.
  • Output: Linear array of genes; GeneOrder plots (closest pairs); CoreGenes genomes table; “loose” genes table; “spliced” genes table.