slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Intro to Next Generation Sequencing PowerPoint Presentation
Download Presentation
Intro to Next Generation Sequencing

Loading in 2 Seconds...

play fullscreen
1 / 46

Intro to Next Generation Sequencing - PowerPoint PPT Presentation


  • 114 Views
  • Uploaded on

Intro to Next Generation Sequencing. Nick Loman and James Hadfield. http:// omicsmaps.com /. Koboldt et al., 2010 (Figure 3). Bench work to build libraries and sequence. Clean up and QA reads. Alignments to Genome or Transcriptome. Analysis of Alignments. Koboldt et al., 2010.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Intro to Next Generation Sequencing' - harper


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Intro to

Next Generation Sequencing

slide2

Nick Loman and James Hadfield

http://omicsmaps.com/

slide6

Bench work to build libraries and sequence

Clean up and QA reads

Alignments to Genome or Transcriptome

Analysis of Alignments

slide7

Koboldt et al., 2010

Sample Contamination

Tumor-normal switches

Sample mix-ups

Run quality

Library chimeras

slide12

GCTACGGCATTCAGGCATCAGGCATTAGCAG

GGCATTCAGGGATCAGGCATTAGC->

<-CATGGCATTCAGGGATCAGGCATT

<-GCCATGGCATTCAGGGATCAGGC

CATTCAGGGATCAGGCATTAGCAG->

GGCATTCAGGGATCAGGCATTAGC->

CATTCAGGGATCAGGCATTAGCAG->

GGCATTCAGGGATCAGGCATT->

<-GGATCAGGCATTAGCAG

<-GATCAGGCATTAGCAG

<-GGATCAGGCATTAGCAG

fastq example
FASTQ Example

For analysis, it may be necessary to convert to the Sanger form of FASTQ…For example,

Illumina stores quality scores ranging from 0-62;

Sanger quality scores range from 0-93.

Solexa quality scores have to be converted to PHRED quality scores.

  • FASTQ example from: Cock et al. (2009). Nuc Acids Res 38:1767-1771.
sam sequence alignment map
SAM (Sequence Alignment/Map)
  • It may not be necessary to align reads from scratch…you can instead use existing alignments in SAM format
    • SAM is the output of aligners that map reads to a reference genome
    • Tab delimited w/ header section and alignment section
      • Header sections begin with @ (are optional)
      • Alignment section has 11 mandatory fields
    • BAM is the binary format of SAM

http://samtools.sourceforge.net/

slide18

Mandatory Alignment Fields

http://samtools.sourceforge.net/SAM1.pdf

slide19

Alignment Examples

Alignments in SAM format

http://samtools.sourceforge.net/SAM1.pdf

slide20

Valid BED files

chr1 86114265 86116346 nsv433165

chr2 1841774 1846089 nsv433166

chr16 2950446 2955264 nsv433167

chr17 14350387 14351933 nsv433168

chr17 32831694 32832761 nsv433169

chr17 32831694 32832761 nsv433170

chr18 61880550 61881930 nsv433171

chr1 16759829 16778548 chr1:21667704 270866 -

chr1 16763194 16784844 chr1:146691804 407277 +

chr1 16763194 16784844 chr1:144004664 408925 -

chr1 16763194 16779513 chr1:142857141 291416 -

chr1 16763194 16779513 chr1:143522082 293473 -

chr1 16763194 16778548 chr1:146844175 284555 -

chr1 16763194 16778548 chr1:147006260 284948 -

chr1 16763411 16784844 chr1:144747517 405362 +

slide22

GVF format

##gff-version 3

##gvf-version 1.02

##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090

##genome-build NCBI MGSCv36

##assembly-name MGSCv36

##assembly-accession GCF_000001635.15

##file-date 2011-11-18

# Study_accession: Combined studies on MGSCv36

# Display_name: Combined studies on MGSCv36

# Study_description: Combined studies on MGSCv36

chr1 dbVarcopy_number_variation 90044442 90114410 . . . ID=nsv433533;Name=nsv433533;Start_range=.,90044442;End_range=90114410,.

chr4 dbVarcopy_number_variation 121483931 121646639 . . . ID=nsv433534;Name=nsv433534;Start_range=.,121483931;End_range=121646639,.

chr9 dbVarcopy_number_variation 109128634 109146964 . . . ID=nsv433535;Name=nsv433535;Start_range=.,109128634;End_range=109146964,.

chr17 dbVarcopy_number_variation 30240627 30614866 . . . ID=nsv433536;Name=nsv433536;Start_range=.,30240627;End_range=30614866,.

chr17 dbVarcopy_number_variation 30983722 31036099 . . . ID=nsv433537;Name=nsv433537;Start_range=.,30983722;End_range=31036099,.

chr17 dbVarcopy_number_variation 34907088 34962504 . . . ID=nsv433538;Name=nsv433538;Start_range=.,34907088;End_range=34962504,.

slide23

Derived data

http://www.ncbi.nlm.nih.gov/dbvar

http://www.ebi.uk/dgva

http://www.ncbi.nlm.nih.gov/snp

slide27

Trace Organization

SRA Organization

seq1

FASTA

Experiments

Quality

Chromatogram

Experimental info

Samples

Sample

Sequences and Qualities

seq2

FASTA

Quality

Chromatogram

Experimental info

Sample

slide28

Era of NGS Explosion

FASTQ Era

Bits/Base Era

As of April 10, 2012 SRA contains less bytes then bases

new cycle decision circle
New CycleDecision Circle

Increases the number of data series

  • BAM and similar formats containing both raw reads and alignments become primary output of raw sequencing

Compression By Reference reduces sizes of other data series

New compression algorithms

New sets of tradeoffs

slide35

Science 1 July 2011:

Vol. 333 no. 6038 pp. 53-58

DOI: 10.1126/science.1207018