Next generation sequencing in virus and parasite research
Download
1 / 27

Next Generation Sequencing in Virus and Parasite Research - PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on

Next Generation Sequencing in Virus and Parasite Research. Four main projects In the lab. Applications Presented. Sanger Read. GS-FLX read. 100Mb | 500Mb per run. ~250bp. 500 bp. >800bp. WGS. Annotation. Population Diversity. Pathogen Discovery. Total scaffolds: ~8250

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Next Generation Sequencing in Virus and Parasite Research' - zoe-bowen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Four main projects

In the lab

Applications Presented

Sanger Read

GS-FLX read

100Mb

|

500Mb

per run

~250bp

500 bp

>800bp

WGS

Annotation

Population

Diversity

Pathogen

Discovery


Total scaffolds: ~8250

Longest scaffold: 6.5 Mb

Total bases in scaffolds: 71 Mb

Total span of scaffolds: 80 Mb

Brugia malayi Genome ProjectParasitic nematode, causes lymphatic filariasis

Genome size ~100Mb

Sanger

(cloning bias)

6 chromosomes in 8250 pieces


Brugia malayi Genome ProjectPHASE II – Use Next-Gen Data

Closing

the

Genome

Curating

the

Data

Next-generation sequencing

Fingerprint maps

Mapping 5’ and 3’UTRs

Functional annotation

DATABASE

Re-assemble genome

Re-annotate

(Confirm UTRs by GSFLX)

(Hybrid Sanger-GSFLX assembly)


Gs flx sequencing of worm gdna and cdna

Mix of random reads and paired reads

Avg read length: ~220bp

GS-FLX Sequencing of WormgDNA and cDNA

Whole Plate

4-well gasket

Paired-Ends and WGS

UTRs

5’UTR

3’UTR

gDNA

SL

~100 Mb

5 runs= 5X coverage

of the genome


Mapping of paired and non paired reads onto genomic assembly

SEQUENCE ASSEMBLY

Mapping of paired and non-paired reads onto genomic assembly

20Mb of Brugia reads = ~0.25X coverage

hits

100%

|

|

80%

Paired-ends

No apparent Bias


Sequencing utrs of b malayi

mRNA

AAAA

RNA oligo

MmeI site

AAAA

P

CIP

TAP

RNA ligase

RT-PCR

AAAA

NlaIII

Unique sequence

SAGE Tag

Concatenated SAGE Tags

DITAGS

Sequencing UTRs of B. malayi

(variable length)


Sequencing results
Sequencing Results

One sequence run

5’UTR

SL

3’UTR

~50Mb of data in ~400,000 reads


Data processing
Data processing

Raw Data

Remove

Linker, Small tags(<10),

Identical, Junk

Blast against

Genome

EST

CDS

Exon

Unmatched tags

Blast against

Small contigs

Mitochondrion

Bacterial

singletons


Mapping of Tags

EST

3’-tag

SL-tag

5’-tag

40S ribosomal protein S18


Intra-Host Diversity of Influenza A Virus

Drug resistant and

Sensitive variants

Antigenic variants


Mapped gs flx sequence reads on antigenic domain of hemagglutinin
Mapped GS-FLX Sequence Readson antigenic domain of Hemagglutinin

566aa

1,757nt

HA1

HA2

450bp

Amplicons:



Patterns non synonymous mutations are predominantly in epitope regions 13 19 sites
Patterns:Non-Synonymous mutations are predominantly in epitope regions(13/19 sites)

A

D

A

A

A

B

B

#reads

2

3

122

1

122

1

2


Identifying rare variants drug resistance mutation
Identifying rare variants:Drug resistance mutation

Matrix segment in H1N1 isolate

4

137

4

2

1

171

78

1

1

1

1

4

1

1

1

35

#reads

Resistant H1N1

1/437=0.2%

agt (S)  aat (N)

N31S


Snp analyses probability that polymorphism is real
SNP Analyses: Probability that Polymorphism is Real

Base# A C G N T GAP SNP probability

pbShort

(polybayes)

- Marth Lab, Boston College


Error correction homopolymer tracks
Error Correction(homopolymer tracks)


Signal processing length distribution adjusting the stringency of quality filters
Signal Processing: Length Distributionadjusting the stringency of quality filters

75,000 – avg ln 200

70,000 – avg ln 195

Changes length distribution

Reads slightly shorter BUT

Average quality is higher

Higher stringency

Default

Read length


Signal processing quality distribution
Signal Processing: Quality Distribution

Default

Reduce the # of bases

BUT

Increase the proportion of

bases of HIGH QUALITY

Higher stringency

15 Million bp

14 Million bp

Quality Score


Whole virus genome sequencing
Whole Virus Genome Sequencing

Limitation of read length BUT:

  • Isolate single genome (limited dilution, other?)

  • Random prime or specific primers with barcodes

  • use barcode to amplify

  • Multiplex: 20 barcodes, 16-well gasket = 320 samples


Virus genomic library construction discovery
Virus Genomic Library Construction- Discovery -

NNNN

Reverse

transcription

RNA

1a

RT

cDNA or ssDNA

NNNN

NNNN

NNNN

NNNN

DNA

extension from

random primers

1b

Klenow Exo-DNA polymerase

NNNN

NNNN

dsDNA

NNNN

NNNN

Amplification

from tags

NNNN

NNNN

2

PCR

Select 500 bp amplicons for emulsion PCR and pyrosequencing

Size selection

& Sequencing

3



Post processing pipeline
Post-Processing Pipeline

Barcodes mapped onto reads

NUCMER

MySQL db

Reads clustered

and reduced to a unique set

BLASTN

BLASTX


26,750 contigs  BLASTN  56% match human DNA

12, 889 contigs  BLASTX  120 match viruses


Oral microbiome project
Oral Microbiome Project

TagA

TagB

TagC

TagD

BU128

WV409

BK026

BR095

BU128

WV409

BK026

BR095

WV001

WV213

BK044

BU130

WV001

WV213

BK044

BU130

BR009

WV597

WV631

BU133

BR009

WV597

WV631

BU133

BR023

WV041

BU137

WV628

BR023

WV041

BU137

WV628

Family

Family

Family

Family

VIRAL

VIRAL

BACTERIAL

BACTERIAL

BACTERIAL

VIRAL

VIRAL

BACTERIAL

1

5

2

6

3

7

4

8

Pool

HIGH

LOW

HIGH

LOW

Periodontal Disease

Caries


Sequencing of

PCR Amplicons 250bp in size

Bacterial Diversity Heat Maps:

Sequencing of

16S rRNA variable region


Acknowledgments

Ghedin Lab

School of Medicine

Jay DePasse

Adam Fitch

Xu Zhang

Funding:

NIDCR/NIH

CTSI

JDRF

Burroughs-Wellcome Fund

School of Dental Medicine

Mary Marazita

Graduate School of Public health

Robert Ferrell

Mike Barmaba

GPCL

Debby Hollingshead

Paul Wood

Janette Lamb


ad