slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
T axon diversity analysis for bulk insect samples using Illumina Hi- seq platform PowerPoint Presentation
Download Presentation
T axon diversity analysis for bulk insect samples using Illumina Hi- seq platform

Loading in 2 Seconds...

play fullscreen
1 / 38

T axon diversity analysis for bulk insect samples using Illumina Hi- seq platform - PowerPoint PPT Presentation


  • 129 Views
  • Uploaded on

T axon diversity analysis for bulk insect samples using Illumina Hi- seq platform. Xin ZHOU, Shanlin LIU, Yiyuan LI, Qing YANG, and Xu SU Department of Science and Technology Environmental Genomics Research Group BGI, China. Adelaide, Australia, 3 December 2011 . Problem.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'T axon diversity analysis for bulk insect samples using Illumina Hi- seq platform' - ivan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Taxon diversity analysis for bulk insect samples using Illumina Hi-seq platform

Xin ZHOU, Shanlin LIU, Yiyuan LI,

Qing YANG, and Xu SU

Department of Science and Technology

Environmental Genomics Research Group

BGI, China

Adelaide, Australia, 3 December 2011

slide2

Problem

Solutions?

Opt.1: ......zzzzZZZZZ

Opt.2: morph sorting  indiv. ID  …  Opt.1

Opt.3: morph sorting  indiv. barcoding …  Opt.1

Opt.4: grinding up  NGS  CLUSTERING/BLAST

 DIVERSITY!

Zhou et al. 2011, 4th International Barcode of Life Conference

slide3

Environmental barcodingof bulk insects

  • aquatic insects
  • mini-barcode (130bp)
  • 454
  • bat diet (insects)
  • COI fragment, 157 bp
  • 454
  • Malaise trap (insects)
  • COI fragment, ~400 bp
  • 454

Biodiversity soup: metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring, Yu D.W. et.al., in review

Zhou et al. 2011, 4th International Barcode of Life Conference

slide4

Major NGS platforms applicable in environmental barcoding

Illumina Hi-Seq

  • higher through-put
  • less $ / bp
  • increasing reading length
  • variety of bioinformatics tools available from genomic pipelines

Zhou et al. 2011, 4th International Barcode of Life Conference

slide5

Sequencing capacity at BGI

  • 28 IlluminaGAIIx
  • 137 IlluminaHi-Seq2000
  • 25 Life Tech SOLiD 4
  • 16 ABI 3730XL
  • 110 MegaBACEs
  • 2 IlluminaiScan
  • 1 Roche 454
  • 1 Ion Torrent
  • 1 Illumina Mi-Seq
  • Data production:
  • 100Gb / day (2009)
  • >5 Tb / day (end of 2010)
  • >1500X human genome / day

Zhou et al. 2011, 4th International Barcode of Life Conference

slide6

What I am NOT going to talk about:

  • Primer optimization
  • Systematic comparisons of NGS platforms
  • Quantitative diversity analysis

What I AM going to talk about:

  • Can Illumina NGS be used in diversity analysis?

Zhou et al. 2011, 4th International Barcode of Life Conference

slide7

Can Illumina NGS be used in diversity analysis?

  • Sequencing error rate
  • Read-length

Zhou et al. 2011, 4th International Barcode of Life Conference

slide8

Sequencing error rate

  • No indel issue in homopolymers
  • Sequencing quality keeps increasing
  • Rare nucleotide error can be easily corrected by:
    • increasing sequencing depth
    • pair-end (PE) sequencing
    • setting stringent matching criteria in the overlapping fragment by allowing only >99% identity

Insert-size

250nt

Recent improvement in sequencing quality using Illumina’s V3 chemical

  • (even at 100 bp, only about 10% of the base callings has error rate >1%)

150bp

150bp

PE sequencing enables forming sequence contigs

Zhou et al. 2011, 4th International Barcode of Life Conference

slide9

Read length

  • Read length keeps increasing
  • Short-gun reads can be further assembled into longer fragments (“short-gun” assembly strategy used in genome sequencing projects)

Insert-size

250nt

  • 150PE enables contigreadof 250bp

150bp

150bp

  • Option of scaffoldassembly

Zhou et al. 2011, 4th International Barcode of Life Conference

slide10

Illuminaenvironmental barcoding

  • Illumina
  • e-barcoding
  • Full length COI
  • Mitochondrial shotgun PE sequencing
  • Full length COI without PCR bias
  • PCR based
  • PCR free

Lib2 (200bp, 150PE)

Lib1 (658bp, 150PE)

  • COI amplicons shotgun PE sequencing
  • Full length COI barcode PE sequencing

Zhou et al. 2011, 4th International Barcode of Life Conference

sample information

Approach #1: PCR-based

Sample information

Zhou et al. 2011, 4th International Barcode of Life Conference

slide12

Approach #1: PCR-based

Pre-analysis data filtering

Zhou et al. 2011, 4th International Barcode of Life Conference

slide13

OTU cluster (98%)

OTU filtering workflow

  • Unique reads (abundance > 1)
  • Compared to reads of Lib 2
  • Remove Chimera
  • Alignment

Zhou et al. 2011, 4th International Barcode of Life Conference

slide14

Sanger Reference

Results

Blast at 100% identity

  • NGS OTUs

LepF1/R1

Mock

XSBN

32

4

198

8

197

36

Customized

primers

Zhou et al. 2011, 4th International Barcode of Life Conference

slide15

Sanger Reference

Mock

  • NGS OTUs

31 can be found in our total sample, from which our mock samples were assembled

“False positive”?

False negative

Not found in raw data (likely due to primer failure)

4

8

36

5 likely to be PCR errors

Zhou et al. 2011, 4th International Barcode of Life Conference

slide16

Sanger Reference

XSBN

Cross-sample contamination?

  • NGS OTUs

17 not found in raw data (primer failure)

Mean + SE

(group1)

(group2)

32

198

197

15 were lost in data filtering

Zhou et al. 2011, 4th International Barcode of Life Conference

slide17

Sanger Reference

  • NGS OTUs

Significantly less false positives

after removal of sequences with abundance <10

49

32

181

198

197

84

Slight drop of true positives

Zhou et al. 2011, 4th International Barcode of Life Conference

slide18

Approach #1: PCR-based

What’s next?

Illuminae-barcoding

  • Obtaining full-length barcodes via short-gun reads assembly (new program in development – “SOAPbarcode”)
  • New algorithm to filter out false positive OTUs

Zhou et al. 2011, 4th International Barcode of Life Conference

slide19

Individual barcoding

Approach #2: PCR-free method

  • Total MT isolation
  • &
  • DNA extraction
  • Shotgun sequencing
  • Reference
  • based method
  • Reference independent method

Zhou et al. 2011, 4th International Barcode of Life Conference

slide20

Building reference library: individual barcoding

89 individuals;

84 reference barcodes;

39 OTUs (2%);

Zhou et al. 2011, 4th International Barcode of Life Conference

slide21

Total MT isolation

& DNA extraction

Zhou et al. 2011, 4th International Barcode of Life Conference

slide22

Shotgun sequencing

  • Insert size: 200bp;
  • Read length: 100bp PE;

Zhou et al. 2011, 4th International Barcode of Life Conference

slide23

Pre-analysis

  • Data filtering:
  • Adaptor contamination removal;
  • Quality control:
    • in each read, only allowing <10bp with seq. error rate >1%

Zhou et al. 2011, 4th International Barcode of Life Conference

slide24

Approach #2: PCR-free method

Method 1: Reference based

Blast reads to reference barcodes,

confident identification is made only when:

Best BLAST hit >98% identity;

Reference coverage > 90%;

Coverage: 100%

Reference 1

Correct mapping

Reference 2

Coverage: 30%

Incorrect mapping

Zhou et al. 2011, 4th International Barcode of Life Conference

slide25

Potential sources of failure in detecting taxa

?

Taxon specific

or

Bio-mass

(size & number)

Zhou et al. 2011, 4th International Barcode of Life Conference

slide26

Failures in taxon detection

Taxon bias?

Zhou et al. 2011, 4th International Barcode of Life Conference

slide27

Failures in taxon detection

OR bio-mass (body size, # individuals)?

Readily detected

Average length> 5mm

Missing

Average length < 5mm

Zhou et al. 2011, 4th International Barcode of Life Conference

slide28

Approach #2: PCR-free method

Method 2: Reference independent

(Will we be able to identify diversity without reference MT genomes for the targeted species?)

Workflow:

  • Assembly of COI gene using genome assembly program (SOAPdenovo);
  • Annotation using ~240 MT genomes downloaded from Genbank;

Zhou et al. 2011, 4th International Barcode of Life Conference

slide29

PCR-Free reference-independent: results

Zhou et al. 2011, 4th International Barcode of Life Conference

slide30

Reference independent

Number of individuals we collected

89 individuals

References independent

23 OTUs

Barcode references

39 OTUs (84 individuals)

References based

26 OTUs

  • 5 individuals failed in Sanger sequencing

3 OTUs not detected in reference independent method because:

(1) sequencing depth is too low (<10X) to allow for reliable assembly

(2) relatively small body-size

Zhou et al. 2011, 4th International Barcode of Life Conference

slide31

PCR-free method

Zhou et al. 2011, 4th International Barcode of Life Conference

slide32

PCR-free method

Barcode region

Zhou et al. 2011, 4th International Barcode of Life Conference

slide33

Approach #2: PCR-free method

What’s next?

Currently:

  • MT DNA 5-10% after isolation;
  • Non-targeting DNA affects MT assembly (e.g., bacteria & genomic DNA);
  • Taxonomic/biomass bias

Potential solutions:

  • Wet-lab protocol optimization
    • Pre-sorting insects by body-size
    • Alternative MT isolation methods
  • Increase sequencing depth

Zhou et al. 2011, 4th International Barcode of Life Conference

slide34

Conclusions

  • IlluminaHi-Seq delivers compatible performance as other NGS platforms in analyzing bulk insect samples, with potential advantages in achieving higher sensitivity at lower cost;
  • Deep sequencing capacity enables a novel PCR-free approach, which may eventually solve biases caused by DNA amplification;
  • It shares issues with other NGS platforms (non-quantitative, inflation of OTUs, etc.)
  • Methodology optimization is much needed in many details of the pipeline;
  • Collaborative and synergistic efforts made by the community would greatly advance the progress.

Zhou et al. 2011, 4th International Barcode of Life Conference

slide35

Acknowledgements

Funder:

Collaborators:

Douglas W. Yu

Kunming Institute of Zoology, Chinese Academy of Sciences

MehrdadHajibabaei, ShadiShokralla

University of Guelph

Owain Edwards

CSIRO Ecosystem Sciences

LU Jianliang

WU Qiong

AN Sainan

ZHOU Yizhuang

ZHAO Jing

Zhou et al. 2011, 4th International Barcode of Life Conference

thanks for your attention
Thanks for your attention!

36

Zhou et al. 2011, 4th International Barcode of Life Conference

slide38

Recovering biodiversity patterns in ecological studies

Zhou et al. 2011, 4th International Barcode of Life Conference